[jira] [Assigned] (HIVE-24223) Insert into Hive tables doesn't work for decimal numbers with no preceding digit before the decimal
[ https://issues.apache.org/jira/browse/HIVE-24223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma reassigned HIVE-24223: Assignee: Ashish Sharma > Insert into Hive tables doesn't work for decimal numbers with no preceding > digit before the decimal > --- > > Key: HIVE-24223 > URL: https://issues.apache.org/jira/browse/HIVE-24223 > Project: Hive > Issue Type: Bug >Affects Versions: All Versions >Reporter: Kriti Jha >Assignee: Ashish Sharma >Priority: Minor > > Any insert operation to a table in Hive with decimal integers without a digit > before the DOT ('.') fails with an exception as shown below: > -- > hive> create table test_dec(id decimal(10,8)); > hive> insert into test_dec values (-.5); > NoViableAltException(16@[412:1: atomExpression : ( constant | ( > intervalExpression )=> intervalExpression | castExpression | > extractExpression | floorExpression | caseExpression | whenExpression | ( > subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR > TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | > tableOrColumn | expressionsInParenthesis[true] > );])NoViableAltException(16@[412:1: atomExpression : ( constant | ( > intervalExpression )=> intervalExpression | castExpression | > extractExpression | floorExpression | caseExpression | whenExpression | ( > subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR > TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | > tableOrColumn | expressionsInParenthesis[true] );]) at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA36.specialStateTransition(HiveParser_IdentifiersParser.java:31810) > at org.antlr.runtime.DFA.predict(DFA.java:80) at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6746) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceFieldExpression(HiveParser_IdentifiersParser.java:6988) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnaryPrefixExpression(HiveParser_IdentifiersParser.java:7324) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnarySuffixExpression(HiveParser_IdentifiersParser.java:7380) > at > ... > > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expression(HiveParser_IdentifiersParser.java:6686) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsNotInParenthesis(HiveParser_IdentifiersParser.java:2287) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsInParenthesis(HiveParser_IdentifiersParser.java:2233) > at > org.apache.hadoop.hive.ql.parse.HiveParser.expressionsInParenthesis(HiveParser.java:42106) > at > org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valueRowConstructor(HiveParser_FromClauseParser.java:6499) > at > org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesTableConstructor(HiveParser_FromClauseParser.java:6583) > at > org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesClause(HiveParser_FromClauseParser.java:6704) > at > org.apache.hadoop.hive.ql.parse.HiveParser.valuesClause(HiveParser.java:41954) > at > org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36536) > at > org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35822) > at > ... > org.apache.hadoop.util.RunJar.main(RunJar.java:153)FAILED: ParseException > line 1:30 cannot recognize input near '.' '5' ')' in expression > specification > -- > It seems to be coming from the Lexer where the types are defined and the > definition of 'Number' should be coming into play: > -- > Number : (Digit)+ ( DOT (Digit)* (Exponent)? | Exponent)? ; > -- > > >[https://github.com/apache/hive/blob/2006e52713508a92fb4d1d28262fd7175eade8b7/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexerParent.g#L469] > However, the below works: > > insert into test_dec values ('-.5'); > > insert into test_dec values (-0.5); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-8950) Add support in ParquetHiveSerde to create table schema from a parquet file
[ https://issues.apache.org/jira/browse/HIVE-8950?focusedWorklogId=494221&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494221 ] ASF GitHub Bot logged work on HIVE-8950: Author: ASF GitHub Bot Created on: 03/Oct/20 00:51 Start Date: 03/Oct/20 00:51 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1353: URL: https://github.com/apache/hive/pull/1353#issuecomment-703016434 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494221) Time Spent: 20m (was: 10m) > Add support in ParquetHiveSerde to create table schema from a parquet file > -- > > Key: HIVE-8950 > URL: https://issues.apache.org/jira/browse/HIVE-8950 > Project: Hive > Issue Type: Improvement >Reporter: Ashish Singh >Assignee: Ashish Singh >Priority: Major > Labels: pull-request-available > Attachments: HIVE-8950.1.patch, HIVE-8950.10.patch, > HIVE-8950.11.patch, HIVE-8950.2.patch, HIVE-8950.3.patch, HIVE-8950.4.patch, > HIVE-8950.5.patch, HIVE-8950.6.patch, HIVE-8950.7.patch, HIVE-8950.8.patch, > HIVE-8950.9.patch, HIVE-8950.patch > > Time Spent: 20m > Remaining Estimate: 0h > > PARQUET-76 and PARQUET-47 ask for creating parquet backed tables without > having to specify the column names and types. As, parquet files store schema > in their footer, it is possible to generate hive schema from parquet file's > metadata. This will improve usability of parquet backed tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23835) Repl Dump should dump function binaries to staging directory
[ https://issues.apache.org/jira/browse/HIVE-23835?focusedWorklogId=494218&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494218 ] ASF GitHub Bot logged work on HIVE-23835: - Author: ASF GitHub Bot Created on: 03/Oct/20 00:51 Start Date: 03/Oct/20 00:51 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1249: URL: https://github.com/apache/hive/pull/1249 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494218) Time Spent: 1h 20m (was: 1h 10m) > Repl Dump should dump function binaries to staging directory > > > Key: HIVE-23835 > URL: https://issues.apache.org/jira/browse/HIVE-23835 > Project: Hive > Issue Type: Task >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23835.01.patch, HIVE-23835.02.patch, > HIVE-23835.03.patch, HIVE-23835.04.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > {color:#172b4d}When hive function's binaries are on source HDFS, repl dump > should dump it to the staging location in order to break cross clusters > visibility requirement.{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23977) Consolidate partition fetch to one place
[ https://issues.apache.org/jira/browse/HIVE-23977?focusedWorklogId=494219&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494219 ] ASF GitHub Bot logged work on HIVE-23977: - Author: ASF GitHub Bot Created on: 03/Oct/20 00:51 Start Date: 03/Oct/20 00:51 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1354: URL: https://github.com/apache/hive/pull/1354#issuecomment-703016426 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494219) Time Spent: 20m (was: 10m) > Consolidate partition fetch to one place > > > Key: HIVE-23977 > URL: https://issues.apache.org/jira/browse/HIVE-23977 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Steve Carlin >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24205) Optimise CuckooSetBytes
[ https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mustafa Iman updated HIVE-24205: Status: Patch Available (was: Open) > Optimise CuckooSetBytes > --- > > Key: HIVE-24205 > URL: https://issues.apache.org/jira/browse/HIVE-24205 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Mustafa Iman >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, > vectorized.patch > > Time Spent: 10m > Remaining Estimate: 0h > > {{FilterStringColumnInList, StringColumnInList}} etc use CuckooSetBytes for > lookup. > !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508! > One option to optimize would be to add boundary conditions on "length" with > the min/max length stored in the hashes (ref: > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85]) > . This would significantly reduce the number of hash computation that needs > to happen. E.g > [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24205) Optimise CuckooSetBytes
[ https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206560#comment-17206560 ] Mustafa Iman commented on HIVE-24205: - [~hashutosh] [~rajesh.balamohan] can you take a look? > Optimise CuckooSetBytes > --- > > Key: HIVE-24205 > URL: https://issues.apache.org/jira/browse/HIVE-24205 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Mustafa Iman >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, > vectorized.patch > > Time Spent: 10m > Remaining Estimate: 0h > > {{FilterStringColumnInList, StringColumnInList}} etc use CuckooSetBytes for > lookup. > !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508! > One option to optimize would be to add boundary conditions on "length" with > the min/max length stored in the hashes (ref: > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85]) > . This would significantly reduce the number of hash computation that needs > to happen. E.g > [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24205) Optimise CuckooSetBytes
[ https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mustafa Iman updated HIVE-24205: Attachment: bench.png > Optimise CuckooSetBytes > --- > > Key: HIVE-24205 > URL: https://issues.apache.org/jira/browse/HIVE-24205 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Mustafa Iman >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, > vectorized.patch > > Time Spent: 10m > Remaining Estimate: 0h > > {{FilterStringColumnInList, StringColumnInList}} etc use CuckooSetBytes for > lookup. > !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508! > One option to optimize would be to add boundary conditions on "length" with > the min/max length stored in the hashes (ref: > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85]) > . This would significantly reduce the number of hash computation that needs > to happen. E.g > [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24205) Optimise CuckooSetBytes
[ https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mustafa Iman updated HIVE-24205: Attachment: vectorized.patch > Optimise CuckooSetBytes > --- > > Key: HIVE-24205 > URL: https://issues.apache.org/jira/browse/HIVE-24205 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Mustafa Iman >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, > vectorized.patch > > Time Spent: 10m > Remaining Estimate: 0h > > {{FilterStringColumnInList, StringColumnInList}} etc use CuckooSetBytes for > lookup. > !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508! > One option to optimize would be to add boundary conditions on "length" with > the min/max length stored in the hashes (ref: > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85]) > . This would significantly reduce the number of hash computation that needs > to happen. E.g > [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24205) Optimise CuckooSetBytes
[ https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206559#comment-17206559 ] Mustafa Iman commented on HIVE-24205: - I added a simple max/min length check in CuckooSetBytes#lookup. Attached file shows some benchmark results. *TPCH_Q12* is a select with IN clause and a join afterwards. Selectivity of the filter is 30%. *Synthetic* query ** is Simple select with IN clause. IN is over two of the longest comment fields (both 72 characters wide). So selectivity is very high at about 2%: select o_orderkey, o_comment from orders where o_comment in ('jole quickly furiously bold escapades: regular accounts play regular req', 's foxes. regular warhorses detect fluffily. carefull y regular tithes amo', 'grate ironic, pending sauternes. deposits do are slyly. carefully ironic') *Synthetic Wide* query is the same as synthetic except IN clause is over one shortest length and one longest length comment. Selectivity is still high at 4% but our optimization cannot eliminate any tuples. select o_orderkey, o_comment from orders where o_comment in ('jole quickly furiously bold escapades: regular accounts play regular req', 'ts nag furiously. even'); The patch outperforms original code by 50% on synthetic query. For tpch q12, there is no meaningful difference between two runs. My conclusion is that the optimization is very low overhead and it gives significant perf improvement in certain cases. I implemented a vectorized version of the early return from cuckooset. It is attached as vectorized.patch. However, in all cases simpler patch outperforms vectorized one. > Optimise CuckooSetBytes > --- > > Key: HIVE-24205 > URL: https://issues.apache.org/jira/browse/HIVE-24205 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Mustafa Iman >Priority: Major > Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, > vectorized.patch > > > {{FilterStringColumnInList, StringColumnInList}} etc use CuckooSetBytes for > lookup. > !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508! > One option to optimize would be to add boundary conditions on "length" with > the min/max length stored in the hashes (ref: > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85]) > . This would significantly reduce the number of hash computation that needs > to happen. E.g > [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24205) Optimise CuckooSetBytes
[ https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24205: -- Labels: pull-request-available (was: ) > Optimise CuckooSetBytes > --- > > Key: HIVE-24205 > URL: https://issues.apache.org/jira/browse/HIVE-24205 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Mustafa Iman >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, > vectorized.patch > > Time Spent: 10m > Remaining Estimate: 0h > > {{FilterStringColumnInList, StringColumnInList}} etc use CuckooSetBytes for > lookup. > !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508! > One option to optimize would be to add boundary conditions on "length" with > the min/max length stored in the hashes (ref: > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85]) > . This would significantly reduce the number of hash computation that needs > to happen. E.g > [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24205) Optimise CuckooSetBytes
[ https://issues.apache.org/jira/browse/HIVE-24205?focusedWorklogId=494204&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494204 ] ASF GitHub Bot logged work on HIVE-24205: - Author: ASF GitHub Bot Created on: 02/Oct/20 23:37 Start Date: 02/Oct/20 23:37 Worklog Time Spent: 10m Work Description: mustafaiman opened a new pull request #1549: URL: https://github.com/apache/hive/pull/1549 Change-Id: I86a28b27859824daf381d5581241fd683d5c85f0 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494204) Remaining Estimate: 0h Time Spent: 10m > Optimise CuckooSetBytes > --- > > Key: HIVE-24205 > URL: https://issues.apache.org/jira/browse/HIVE-24205 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Mustafa Iman >Priority: Major > Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, > vectorized.patch > > Time Spent: 10m > Remaining Estimate: 0h > > {{FilterStringColumnInList, StringColumnInList}} etc use CuckooSetBytes for > lookup. > !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508! > One option to optimize would be to add boundary conditions on "length" with > the min/max length stored in the hashes (ref: > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85]) > . This would significantly reduce the number of hash computation that needs > to happen. E.g > [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24205) Optimise CuckooSetBytes
[ https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mustafa Iman updated HIVE-24205: Attachment: (was: Screen Shot 2020-10-02 at 4.15.32 PM.png) > Optimise CuckooSetBytes > --- > > Key: HIVE-24205 > URL: https://issues.apache.org/jira/browse/HIVE-24205 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Mustafa Iman >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, > vectorized.patch > > Time Spent: 10m > Remaining Estimate: 0h > > {{FilterStringColumnInList, StringColumnInList}} etc use CuckooSetBytes for > lookup. > !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508! > One option to optimize would be to add boundary conditions on "length" with > the min/max length stored in the hashes (ref: > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85]) > . This would significantly reduce the number of hash computation that needs > to happen. E.g > [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24205) Optimise CuckooSetBytes
[ https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mustafa Iman updated HIVE-24205: Attachment: Screen Shot 2020-10-02 at 4.15.32 PM.png > Optimise CuckooSetBytes > --- > > Key: HIVE-24205 > URL: https://issues.apache.org/jira/browse/HIVE-24205 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Mustafa Iman >Priority: Major > Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, > vectorized.patch > > > {{FilterStringColumnInList, StringColumnInList}} etc use CuckooSetBytes for > lookup. > !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508! > One option to optimize would be to add boundary conditions on "length" with > the min/max length stored in the hashes (ref: > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85]) > . This would significantly reduce the number of hash computation that needs > to happen. E.g > [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24205) Optimise CuckooSetBytes
[ https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mustafa Iman reassigned HIVE-24205: --- Assignee: Mustafa Iman > Optimise CuckooSetBytes > --- > > Key: HIVE-24205 > URL: https://issues.apache.org/jira/browse/HIVE-24205 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Mustafa Iman >Priority: Major > Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png > > > {{FilterStringColumnInList, StringColumnInList}} etc use CuckooSetBytes for > lookup. > !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508! > One option to optimize would be to add boundary conditions on "length" with > the min/max length stored in the hashes (ref: > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85]) > . This would significantly reduce the number of hash computation that needs > to happen. E.g > [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24202) Clean up local HS2 HMS cache code (II)
[ https://issues.apache.org/jira/browse/HIVE-24202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206408#comment-17206408 ] Jesus Camacho Rodriguez commented on HIVE-24202: [~vgarg], could you take a look? Thanks > Clean up local HS2 HMS cache code (II) > -- > > Key: HIVE-24202 > URL: https://issues.apache.org/jira/browse/HIVE-24202 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Follow-up for HIVE-24183 (split into different JIRAs). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24222) Upgrade ORC to 1.5.12
[ https://issues.apache.org/jira/browse/HIVE-24222?focusedWorklogId=494052&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494052 ] ASF GitHub Bot logged work on HIVE-24222: - Author: ASF GitHub Bot Created on: 02/Oct/20 16:24 Start Date: 02/Oct/20 16:24 Worklog Time Spent: 10m Work Description: dongjoon-hyun edited a comment on pull request #1545: URL: https://github.com/apache/hive/pull/1545#issuecomment-702828726 Thank you so much for merging, @sunchao ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494052) Time Spent: 2h (was: 1h 50m) > Upgrade ORC to 1.5.12 > - > > Key: HIVE-24222 > URL: https://issues.apache.org/jira/browse/HIVE-24222 > Project: Hive > Issue Type: Improvement > Components: ORC >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24222) Upgrade ORC to 1.5.12
[ https://issues.apache.org/jira/browse/HIVE-24222?focusedWorklogId=494051&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494051 ] ASF GitHub Bot logged work on HIVE-24222: - Author: ASF GitHub Bot Created on: 02/Oct/20 16:23 Start Date: 02/Oct/20 16:23 Worklog Time Spent: 10m Work Description: dongjoon-hyun commented on pull request #1545: URL: https://github.com/apache/hive/pull/1545#issuecomment-702828726 Thank you so much, @sunchao ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494051) Time Spent: 1h 50m (was: 1h 40m) > Upgrade ORC to 1.5.12 > - > > Key: HIVE-24222 > URL: https://issues.apache.org/jira/browse/HIVE-24222 > Project: Hive > Issue Type: Improvement > Components: ORC >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24222) Upgrade ORC to 1.5.12
[ https://issues.apache.org/jira/browse/HIVE-24222?focusedWorklogId=494044&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494044 ] ASF GitHub Bot logged work on HIVE-24222: - Author: ASF GitHub Bot Created on: 02/Oct/20 16:18 Start Date: 02/Oct/20 16:18 Worklog Time Spent: 10m Work Description: sunchao merged pull request #1545: URL: https://github.com/apache/hive/pull/1545 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494044) Time Spent: 1h 40m (was: 1.5h) > Upgrade ORC to 1.5.12 > - > > Key: HIVE-24222 > URL: https://issues.apache.org/jira/browse/HIVE-24222 > Project: Hive > Issue Type: Improvement > Components: ORC >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24210) PartitionManagementTask fails if one of tables dropped after fetching TableMeta
[ https://issues.apache.org/jira/browse/HIVE-24210?focusedWorklogId=494041&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494041 ] ASF GitHub Bot logged work on HIVE-24210: - Author: ASF GitHub Bot Created on: 02/Oct/20 16:12 Start Date: 02/Oct/20 16:12 Worklog Time Spent: 10m Work Description: vineetgarg02 merged pull request #1536: URL: https://github.com/apache/hive/pull/1536 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494041) Time Spent: 20m (was: 10m) > PartitionManagementTask fails if one of tables dropped after fetching > TableMeta > --- > > Key: HIVE-24210 > URL: https://issues.apache.org/jira/browse/HIVE-24210 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > After fetching tableMeta based on configured dbPattern & tablePattern for PMT > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java#L125 > If one of the tables dropped before scheduling AutoPartition Discovery or > MSCK, then entire PMT will be stopped because of below exception even though > we can run MSCK for other valid tables. > {code:java} > 2020-09-21T10:45:15,875 ERROR [pool-4-thread-150]: > metastore.PartitionManagementTask (PartitionManagementTask.java:run(163)) - > Exception while running partition discovery task for table: null > org.apache.hadoop.hive.metastore.api.NoSuchObjectException: > hive.default.test_table table not found > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_core(HiveMetaStore.java:3391) > > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTableInternal(HiveMetaStore.java:3315) > > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_req(HiveMetaStore.java:3291) > > at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) > > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) > > at com.sun.proxy.$Proxy30.get_table_req(Unknown Source) ~[?:?] > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1804) > > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1791) > > at > org.apache.hadoop.hive.metastore.PartitionManagementTask.run(PartitionManagementTask.java:130){code} > Exception is thrown from here. > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java#L130] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24210) PartitionManagementTask fails if one of tables dropped after fetching TableMeta
[ https://issues.apache.org/jira/browse/HIVE-24210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg resolved HIVE-24210. Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master. Thanks [~nareshpr] > PartitionManagementTask fails if one of tables dropped after fetching > TableMeta > --- > > Key: HIVE-24210 > URL: https://issues.apache.org/jira/browse/HIVE-24210 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > After fetching tableMeta based on configured dbPattern & tablePattern for PMT > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java#L125 > If one of the tables dropped before scheduling AutoPartition Discovery or > MSCK, then entire PMT will be stopped because of below exception even though > we can run MSCK for other valid tables. > {code:java} > 2020-09-21T10:45:15,875 ERROR [pool-4-thread-150]: > metastore.PartitionManagementTask (PartitionManagementTask.java:run(163)) - > Exception while running partition discovery task for table: null > org.apache.hadoop.hive.metastore.api.NoSuchObjectException: > hive.default.test_table table not found > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_core(HiveMetaStore.java:3391) > > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTableInternal(HiveMetaStore.java:3315) > > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_req(HiveMetaStore.java:3291) > > at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) > > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) > > at com.sun.proxy.$Proxy30.get_table_req(Unknown Source) ~[?:?] > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1804) > > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1791) > > at > org.apache.hadoop.hive.metastore.PartitionManagementTask.run(PartitionManagementTask.java:130){code} > Exception is thrown from here. > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java#L130] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24210) PartitionManagementTask fails if one of tables dropped after fetching TableMeta
[ https://issues.apache.org/jira/browse/HIVE-24210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206302#comment-17206302 ] Vineet Garg commented on HIVE-24210: +1. LGTM > PartitionManagementTask fails if one of tables dropped after fetching > TableMeta > --- > > Key: HIVE-24210 > URL: https://issues.apache.org/jira/browse/HIVE-24210 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > After fetching tableMeta based on configured dbPattern & tablePattern for PMT > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java#L125 > If one of the tables dropped before scheduling AutoPartition Discovery or > MSCK, then entire PMT will be stopped because of below exception even though > we can run MSCK for other valid tables. > {code:java} > 2020-09-21T10:45:15,875 ERROR [pool-4-thread-150]: > metastore.PartitionManagementTask (PartitionManagementTask.java:run(163)) - > Exception while running partition discovery task for table: null > org.apache.hadoop.hive.metastore.api.NoSuchObjectException: > hive.default.test_table table not found > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_core(HiveMetaStore.java:3391) > > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTableInternal(HiveMetaStore.java:3315) > > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_req(HiveMetaStore.java:3291) > > at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) > > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) > > at com.sun.proxy.$Proxy30.get_table_req(Unknown Source) ~[?:?] > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1804) > > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1791) > > at > org.apache.hadoop.hive.metastore.PartitionManagementTask.run(PartitionManagementTask.java:130){code} > Exception is thrown from here. > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java#L130] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
[ https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=494022&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494022 ] ASF GitHub Bot logged work on HIVE-22826: - Author: ASF GitHub Bot Created on: 02/Oct/20 15:49 Start Date: 02/Oct/20 15:49 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #1528: URL: https://github.com/apache/hive/pull/1528#discussion_r498905028 ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/table/column/change/AlterTableChangeColumnOperation.java ## @@ -72,6 +74,10 @@ protected void doAlteration(Table table, Partition partition) throws HiveExcepti if (desc.getNewColumnComment() != null) { oldColumn.setComment(desc.getNewColumnComment()); } +if (CollectionUtils.isNotEmpty(sd.getBucketCols()) && sd.getBucketCols().contains(oldColumnName)) { + sd.getBucketCols().remove(oldColumnName); + sd.getBucketCols().add(desc.getNewColumnName()); Review comment: newColumnName is converted toLowerCase in query planning while populating "desc" but to be fail safe i have added toLowerCase() here also. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494022) Time Spent: 2h 40m (was: 2.5h) > ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names > --- > > Key: HIVE-22826 > URL: https://issues.apache.org/jira/browse/HIVE-22826 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Karen Coppage >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Attachments: unitTest.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Compaction for tables where a bucketed column has been renamed fails since > the list of bucketed columns in the StorageDescriptor doesn't get updated > when the column is renamed, therefore we can't recreate the table correctly > during compaction. > Attached a unit test that fails. > NO PRECOMMIT TESTS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
[ https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=494019&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494019 ] ASF GitHub Bot logged work on HIVE-22826: - Author: ASF GitHub Bot Created on: 02/Oct/20 15:48 Start Date: 02/Oct/20 15:48 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #1528: URL: https://github.com/apache/hive/pull/1528#discussion_r498904232 ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/table/column/change/AlterTableChangeColumnOperation.java ## @@ -72,6 +74,10 @@ protected void doAlteration(Table table, Partition partition) throws HiveExcepti if (desc.getNewColumnComment() != null) { oldColumn.setComment(desc.getNewColumnComment()); } +if (CollectionUtils.isNotEmpty(sd.getBucketCols()) && sd.getBucketCols().contains(oldColumnName)) { Review comment: As per HIVE column contract it should be case in-sensitive. But it is not handled properly in query planning of "alter table {tablename} change". So I have added toLowerCase() in query planning also. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494019) Time Spent: 2.5h (was: 2h 20m) > ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names > --- > > Key: HIVE-22826 > URL: https://issues.apache.org/jira/browse/HIVE-22826 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Karen Coppage >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Attachments: unitTest.patch > > Time Spent: 2.5h > Remaining Estimate: 0h > > Compaction for tables where a bucketed column has been renamed fails since > the list of bucketed columns in the StorageDescriptor doesn't get updated > when the column is renamed, therefore we can't recreate the table correctly > during compaction. > Attached a unit test that fails. > NO PRECOMMIT TESTS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
[ https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=494016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494016 ] ASF GitHub Bot logged work on HIVE-22826: - Author: ASF GitHub Bot Created on: 02/Oct/20 15:45 Start Date: 02/Oct/20 15:45 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #1528: URL: https://github.com/apache/hive/pull/1528#discussion_r498902660 ## File path: ql/src/test/queries/clientpositive/alter_bucketedtable_change_column.q ## @@ -0,0 +1,10 @@ +--! qt:dataset:src +create table alter_bucket_change_col_t1(key string, value string) partitioned by (ds string) clustered by (key) into 10 buckets; + +describe formatted alter_bucket_change_col_t1; + +-- Test changing name of bucket column + +alter table alter_bucket_change_col_t1 change key keys string; Review comment: added - "Serial_Num" which cover lower case, Upper case and also special char. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494016) Time Spent: 2h 20m (was: 2h 10m) > ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names > --- > > Key: HIVE-22826 > URL: https://issues.apache.org/jira/browse/HIVE-22826 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Karen Coppage >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Attachments: unitTest.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Compaction for tables where a bucketed column has been renamed fails since > the list of bucketed columns in the StorageDescriptor doesn't get updated > when the column is renamed, therefore we can't recreate the table correctly > during compaction. > Attached a unit test that fails. > NO PRECOMMIT TESTS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
[ https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=494015&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494015 ] ASF GitHub Bot logged work on HIVE-22826: - Author: ASF GitHub Bot Created on: 02/Oct/20 15:44 Start Date: 02/Oct/20 15:44 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #1528: URL: https://github.com/apache/hive/pull/1528#discussion_r498902147 ## File path: ql/src/test/queries/clientpositive/alter_numbuckets_partitioned_table_h23.q ## @@ -52,6 +52,12 @@ alter table tst1_n1 clustered by (value) into 12 buckets; describe formatted tst1_n1; +-- Test changing name of bucket column + +alter table tst1_n1 change key keys string; + +describe formatted tst1_n1; Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494015) Time Spent: 2h 10m (was: 2h) > ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names > --- > > Key: HIVE-22826 > URL: https://issues.apache.org/jira/browse/HIVE-22826 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Karen Coppage >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Attachments: unitTest.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Compaction for tables where a bucketed column has been renamed fails since > the list of bucketed columns in the StorageDescriptor doesn't get updated > when the column is renamed, therefore we can't recreate the table correctly > during compaction. > Attached a unit test that fails. > NO PRECOMMIT TESTS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
[ https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=494014&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494014 ] ASF GitHub Bot logged work on HIVE-22826: - Author: ASF GitHub Bot Created on: 02/Oct/20 15:44 Start Date: 02/Oct/20 15:44 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #1528: URL: https://github.com/apache/hive/pull/1528#discussion_r498901990 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ## @@ -130,6 +130,11 @@ public void alterTable(RawStore msdb, Warehouse wh, String catName, String dbnam throw new InvalidOperationException("Invalid column " + validate); } +// Validate bucketedColumns in new table +if (!MetaStoreServerUtils.validateBucketColumns(newt.getSd())) { + throw new InvalidOperationException("Bucket column doesn't match with any table columns"); Review comment: 1. Converted return type to List. 2. Added Log.error() along with column name. 3. Added column to exception also. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494014) Time Spent: 2h (was: 1h 50m) > ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names > --- > > Key: HIVE-22826 > URL: https://issues.apache.org/jira/browse/HIVE-22826 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Karen Coppage >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Attachments: unitTest.patch > > Time Spent: 2h > Remaining Estimate: 0h > > Compaction for tables where a bucketed column has been renamed fails since > the list of bucketed columns in the StorageDescriptor doesn't get updated > when the column is renamed, therefore we can't recreate the table correctly > during compaction. > Attached a unit test that fails. > NO PRECOMMIT TESTS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24222) Upgrade ORC to 1.5.12
[ https://issues.apache.org/jira/browse/HIVE-24222?focusedWorklogId=494013&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494013 ] ASF GitHub Bot logged work on HIVE-24222: - Author: ASF GitHub Bot Created on: 02/Oct/20 15:43 Start Date: 02/Oct/20 15:43 Worklog Time Spent: 10m Work Description: dongjoon-hyun commented on pull request #1545: URL: https://github.com/apache/hive/pull/1545#issuecomment-702807686 Thank you, @pgaref and @sunchao . It's passed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494013) Time Spent: 1.5h (was: 1h 20m) > Upgrade ORC to 1.5.12 > - > > Key: HIVE-24222 > URL: https://issues.apache.org/jira/browse/HIVE-24222 > Project: Hive > Issue Type: Improvement > Components: ORC >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
[ https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=494012&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494012 ] ASF GitHub Bot logged work on HIVE-22826: - Author: ASF GitHub Bot Created on: 02/Oct/20 15:42 Start Date: 02/Oct/20 15:42 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #1528: URL: https://github.com/apache/hive/pull/1528#discussion_r498900786 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java ## @@ -1554,4 +1555,32 @@ public static Partition createMetaPartitionObject(Table tbl, Map } return tpart; } + + /** + * Validate bucket columns should belong to table columns. + * @param sd StorageDescriptor of given table + * @return true if bucket columns are empty or belong to table columns else false + */ + public static boolean validateBucketColumns(StorageDescriptor sd) { +List columnNames = getColumnNames(sd.getCols()); Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494012) Time Spent: 1h 50m (was: 1h 40m) > ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names > --- > > Key: HIVE-22826 > URL: https://issues.apache.org/jira/browse/HIVE-22826 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Karen Coppage >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Attachments: unitTest.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Compaction for tables where a bucketed column has been renamed fails since > the list of bucketed columns in the StorageDescriptor doesn't get updated > when the column is renamed, therefore we can't recreate the table correctly > during compaction. > Attached a unit test that fails. > NO PRECOMMIT TESTS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
[ https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=494011&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494011 ] ASF GitHub Bot logged work on HIVE-22826: - Author: ASF GitHub Bot Created on: 02/Oct/20 15:42 Start Date: 02/Oct/20 15:42 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #1528: URL: https://github.com/apache/hive/pull/1528#discussion_r498900655 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java ## @@ -1554,4 +1555,32 @@ public static Partition createMetaPartitionObject(Table tbl, Map } return tpart; } + + /** + * Validate bucket columns should belong to table columns. + * @param sd StorageDescriptor of given table + * @return true if bucket columns are empty or belong to table columns else false + */ + public static boolean validateBucketColumns(StorageDescriptor sd) { +List columnNames = getColumnNames(sd.getCols()); +if(CollectionUtils.isNotEmpty(sd.getBucketCols()) && CollectionUtils.isNotEmpty(columnNames)){ Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494011) Time Spent: 1h 40m (was: 1.5h) > ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names > --- > > Key: HIVE-22826 > URL: https://issues.apache.org/jira/browse/HIVE-22826 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Karen Coppage >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Attachments: unitTest.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Compaction for tables where a bucketed column has been renamed fails since > the list of bucketed columns in the StorageDescriptor doesn't get updated > when the column is renamed, therefore we can't recreate the table correctly > during compaction. > Attached a unit test that fails. > NO PRECOMMIT TESTS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
[ https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=494010&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494010 ] ASF GitHub Bot logged work on HIVE-22826: - Author: ASF GitHub Bot Created on: 02/Oct/20 15:41 Start Date: 02/Oct/20 15:41 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #1528: URL: https://github.com/apache/hive/pull/1528#discussion_r498900535 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java ## @@ -1554,4 +1555,32 @@ public static Partition createMetaPartitionObject(Table tbl, Map } return tpart; } + + /** + * Validate bucket columns should belong to table columns. + * @param sd StorageDescriptor of given table + * @return true if bucket columns are empty or belong to table columns else false + */ + public static boolean validateBucketColumns(StorageDescriptor sd) { +List columnNames = getColumnNames(sd.getCols()); +if(CollectionUtils.isNotEmpty(sd.getBucketCols()) && CollectionUtils.isNotEmpty(columnNames)){ + return columnNames.containsAll(sd.getBucketCols().stream().map(String::toLowerCase).collect(Collectors.toList())); +} else if (CollectionUtils.isNotEmpty(sd.getBucketCols()) && CollectionUtils.isEmpty(columnNames)) { + return false; +} else { + return true; +} + } + + /** + * Generate column name list from the fieldSchema list + * @param cols fieldSchema list + * @return column name list + */ + public static List getColumnNames(List cols) { +if (CollectionUtils.isNotEmpty(cols)) { + return cols.stream().map(FieldSchema::getName).collect(Collectors.toList()); Review comment: Expected column name in lower case. But in order to be fail safe added toLowerCase() here also. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494010) Time Spent: 1.5h (was: 1h 20m) > ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names > --- > > Key: HIVE-22826 > URL: https://issues.apache.org/jira/browse/HIVE-22826 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Karen Coppage >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Attachments: unitTest.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > Compaction for tables where a bucketed column has been renamed fails since > the list of bucketed columns in the StorageDescriptor doesn't get updated > when the column is renamed, therefore we can't recreate the table correctly > during compaction. > Attached a unit test that fails. > NO PRECOMMIT TESTS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=494005&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494005 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 02/Oct/20 15:38 Start Date: 02/Oct/20 15:38 Worklog Time Spent: 10m Work Description: deniskuzZ commented on pull request #1548: URL: https://github.com/apache/hive/pull/1548#issuecomment-702804902 @vpnvishv , could you please check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 494005) Time Spent: 50m (was: 40m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 50m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures
[ https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=493985&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493985 ] ASF GitHub Bot logged work on HIVE-24217: - Author: ASF GitHub Bot Created on: 02/Oct/20 15:04 Start Date: 02/Oct/20 15:04 Worklog Time Spent: 10m Work Description: zeroflag commented on a change in pull request #1542: URL: https://github.com/apache/hive/pull/1542#discussion_r498878756 ## File path: standalone-metastore/metastore-server/src/main/resources/package.jdo ## @@ -1549,6 +1549,83 @@ + + + + + + + + + + + + + + + + + + + + + + Review comment: I'll double check it, I remember having some problem with another textual datatype, I can't remember which one was that, that's why MEDIUMTEXT was chosen. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 493985) Time Spent: 1h 20m (was: 1h 10m) > HMS storage backend for HPL/SQL stored procedures > - > > Key: HIVE-24217 > URL: https://issues.apache.org/jira/browse/HIVE-24217 > Project: Hive > Issue Type: Bug > Components: Hive, hpl/sql, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Attachments: HPL_SQL storedproc HMS storage.pdf > > Time Spent: 1h 20m > Remaining Estimate: 0h > > HPL/SQL procedures are currently stored in text files. The goal of this Jira > is to implement a Metastore backend for storing and loading these procedures. > This is an incremental step towards having fully capable stored procedures in > Hive. > > See the attached design for more information. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures
[ https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=493984&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493984 ] ASF GitHub Bot logged work on HIVE-24217: - Author: ASF GitHub Bot Created on: 02/Oct/20 15:03 Start Date: 02/Oct/20 15:03 Worklog Time Spent: 10m Work Description: zeroflag commented on a change in pull request #1542: URL: https://github.com/apache/hive/pull/1542#discussion_r498877894 ## File path: standalone-metastore/metastore-server/src/main/resources/package.jdo ## @@ -1549,6 +1549,83 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Review comment: The signature is already parsed by the time the procedure is being created. We would need to drop that information, get back the textual representation of the signature to store it in HMS, and reparse it on the client side when someone calls the procedure. That's maybe not a big deal but still unnecessary to parse it twice. Storing it in a structured way also ensures some degree of validity, you can't store a syntactically incorrect signature if we store it in a structured way. I'm not sure if they never participate in a query. If one wants to discover the stored procedures which are currently stored in a DB and find out on what data they operate they would need to do some clumsy string manipulations on the signature. Considering that other DB engines also store these information separately I would like to keep it as it is for now and see how it works in practice. Later on when we have multi language support we can revisit this issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 493984) Time Spent: 1h 10m (was: 1h) > HMS storage backend for HPL/SQL stored procedures > - > > Key: HIVE-24217 > URL: https://issues.apache.org/jira/browse/HIVE-24217 > Project: Hive > Issue Type: Bug > Components: Hive, hpl/sql, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Attachments: HPL_SQL storedproc HMS storage.pdf > > Time Spent: 1h 10m > Remaining Estimate: 0h > > HPL/SQL procedures are currently stored in text files. The goal of this Jira > is to implement a Metastore backend for storing and loading these procedures. > This is an incremental step towards having fully capable stored procedures in > Hive. > > See the attached design for more information. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures
[ https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=493978&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493978 ] ASF GitHub Bot logged work on HIVE-24217: - Author: ASF GitHub Bot Created on: 02/Oct/20 14:51 Start Date: 02/Oct/20 14:51 Worklog Time Spent: 10m Work Description: zeroflag commented on a change in pull request #1542: URL: https://github.com/apache/hive/pull/1542#discussion_r498868959 ## File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift ## @@ -2830,6 +2848,11 @@ PartitionsResponse get_partitions_req(1:PartitionsRequest req) void add_replication_metrics(1: ReplicationMetricList replicationMetricList) throws(1:MetaException o1) ReplicationMetricList get_replication_metrics(1: GetReplicationMetricsRequest rqst) throws(1:MetaException o1) GetOpenTxnsResponse get_open_txns_req(1: GetOpenTxnsRequest getOpenTxnsRequest) + + void create_stored_procedure(1: string catName, 2: StoredProcedure proc) throws(1:NoSuchObjectException o1, 2:MetaException o2) + StoredProcedure get_stored_procedure(1: string catName, 2: string db, 3: string name) throws (1:MetaException o1, 2:NoSuchObjectException o2) + void drop_stored_procedure(1: string catName, 2: string dbName, 3: string funcName) throws (1:MetaException o1, 2:NoSuchObjectException o2) + list get_all_stored_procedures(1: string catName) throws (1:MetaException o1) Review comment: You mean putting (1: string catName, 2: string dbName, 3: string funcName) into a request object? I can do that. But if we have only one parameter, like in the last case that would be an overkill in my opinion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 493978) Time Spent: 1h (was: 50m) > HMS storage backend for HPL/SQL stored procedures > - > > Key: HIVE-24217 > URL: https://issues.apache.org/jira/browse/HIVE-24217 > Project: Hive > Issue Type: Bug > Components: Hive, hpl/sql, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Attachments: HPL_SQL storedproc HMS storage.pdf > > Time Spent: 1h > Remaining Estimate: 0h > > HPL/SQL procedures are currently stored in text files. The goal of this Jira > is to implement a Metastore backend for storing and loading these procedures. > This is an incremental step towards having fully capable stored procedures in > Hive. > > See the attached design for more information. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures
[ https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=493976&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493976 ] ASF GitHub Bot logged work on HIVE-24217: - Author: ASF GitHub Bot Created on: 02/Oct/20 14:48 Start Date: 02/Oct/20 14:48 Worklog Time Spent: 10m Work Description: zeroflag commented on a change in pull request #1542: URL: https://github.com/apache/hive/pull/1542#discussion_r498868959 ## File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift ## @@ -2830,6 +2848,11 @@ PartitionsResponse get_partitions_req(1:PartitionsRequest req) void add_replication_metrics(1: ReplicationMetricList replicationMetricList) throws(1:MetaException o1) ReplicationMetricList get_replication_metrics(1: GetReplicationMetricsRequest rqst) throws(1:MetaException o1) GetOpenTxnsResponse get_open_txns_req(1: GetOpenTxnsRequest getOpenTxnsRequest) + + void create_stored_procedure(1: string catName, 2: StoredProcedure proc) throws(1:NoSuchObjectException o1, 2:MetaException o2) + StoredProcedure get_stored_procedure(1: string catName, 2: string db, 3: string name) throws (1:MetaException o1, 2:NoSuchObjectException o2) + void drop_stored_procedure(1: string catName, 2: string dbName, 3: string funcName) throws (1:MetaException o1, 2:NoSuchObjectException o2) + list get_all_stored_procedures(1: string catName) throws (1:MetaException o1) Review comment: You mean putting (1: string catName, 2: string dbName, 3: string funcName) into a request object? I can do that. But if we have only can parameter, like in the last case that would be an overkill in my opinion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 493976) Time Spent: 50m (was: 40m) > HMS storage backend for HPL/SQL stored procedures > - > > Key: HIVE-24217 > URL: https://issues.apache.org/jira/browse/HIVE-24217 > Project: Hive > Issue Type: Bug > Components: Hive, hpl/sql, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Attachments: HPL_SQL storedproc HMS storage.pdf > > Time Spent: 50m > Remaining Estimate: 0h > > HPL/SQL procedures are currently stored in text files. The goal of this Jira > is to implement a Metastore backend for storing and loading these procedures. > This is an incremental step towards having fully capable stored procedures in > Hive. > > See the attached design for more information. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures
[ https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=493974&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493974 ] ASF GitHub Bot logged work on HIVE-24217: - Author: ASF GitHub Bot Created on: 02/Oct/20 14:46 Start Date: 02/Oct/20 14:46 Worklog Time Spent: 10m Work Description: zeroflag commented on a change in pull request #1542: URL: https://github.com/apache/hive/pull/1542#discussion_r498867496 ## File path: standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql ## @@ -786,6 +786,35 @@ CREATE TABLE "APP"."REPLICATION_METRICS" ( CREATE INDEX "POLICY_IDX" ON "APP"."REPLICATION_METRICS" ("RM_POLICY"); CREATE INDEX "DUMP_IDX" ON "APP"."REPLICATION_METRICS" ("RM_DUMP_EXECUTION_ID"); +-- Create stored procedure tables +CREATE TABLE "APP"."STORED_PROCS" ( + "SP_ID" BIGINT NOT NULL, + "CREATE_TIME" INTEGER NOT NULL, + "LAST_ACCESS_TIME" INTEGER NOT NULL, Review comment: the intention was to have something the represents the last modification date (maybe the name was chosen poorly), but ok I'll remove it, it is not used This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 493974) Time Spent: 40m (was: 0.5h) > HMS storage backend for HPL/SQL stored procedures > - > > Key: HIVE-24217 > URL: https://issues.apache.org/jira/browse/HIVE-24217 > Project: Hive > Issue Type: Bug > Components: Hive, hpl/sql, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Attachments: HPL_SQL storedproc HMS storage.pdf > > Time Spent: 40m > Remaining Estimate: 0h > > HPL/SQL procedures are currently stored in text files. The goal of this Jira > is to implement a Metastore backend for storing and loading these procedures. > This is an incremental step towards having fully capable stored procedures in > Hive. > > See the attached design for more information. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures
[ https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=493970&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493970 ] ASF GitHub Bot logged work on HIVE-24217: - Author: ASF GitHub Bot Created on: 02/Oct/20 14:41 Start Date: 02/Oct/20 14:41 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #1542: URL: https://github.com/apache/hive/pull/1542#discussion_r498830563 ## File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift ## @@ -2830,6 +2848,11 @@ PartitionsResponse get_partitions_req(1:PartitionsRequest req) void add_replication_metrics(1: ReplicationMetricList replicationMetricList) throws(1:MetaException o1) ReplicationMetricList get_replication_metrics(1: GetReplicationMetricsRequest rqst) throws(1:MetaException o1) GetOpenTxnsResponse get_open_txns_req(1: GetOpenTxnsRequest getOpenTxnsRequest) + + void create_stored_procedure(1: string catName, 2: StoredProcedure proc) throws(1:NoSuchObjectException o1, 2:MetaException o2) + StoredProcedure get_stored_procedure(1: string catName, 2: string db, 3: string name) throws (1:MetaException o1, 2:NoSuchObjectException o2) + void drop_stored_procedure(1: string catName, 2: string dbName, 3: string funcName) throws (1:MetaException o1, 2:NoSuchObjectException o2) + list get_all_stored_procedures(1: string catName) throws (1:MetaException o1) Review comment: could you please follow the convention of other methods and define a struct for the requests arguments ## File path: standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql ## @@ -786,6 +786,35 @@ CREATE TABLE "APP"."REPLICATION_METRICS" ( CREATE INDEX "POLICY_IDX" ON "APP"."REPLICATION_METRICS" ("RM_POLICY"); CREATE INDEX "DUMP_IDX" ON "APP"."REPLICATION_METRICS" ("RM_DUMP_EXECUTION_ID"); +-- Create stored procedure tables +CREATE TABLE "APP"."STORED_PROCS" ( + "SP_ID" BIGINT NOT NULL, + "CREATE_TIME" INTEGER NOT NULL, + "LAST_ACCESS_TIME" INTEGER NOT NULL, Review comment: I think we should only add fields which are actually usefull and in use - because right now the accesstime would not be updated at all I don't think we should add it. ## File path: standalone-metastore/metastore-server/src/main/resources/package.jdo ## @@ -1549,6 +1549,83 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Review comment: I think instead of storing the return_type/argument types and such in the metastore - as they would never participate in a query or anything "usefull"; they will just travel as payload in the messages. Given the fact that they are effectively implicit data which can be figured out from the function defintion - I think we may leave it to the execution engine; it should be able to figure it out (since it should be able to use it) . optionally; to give ourselfs(and users) some type of clarity we could add a "signature" string to the table - which could provide a human readable signature ## File path: standalone-metastore/metastore-server/src/main/resources/package.jdo ## @@ -1549,6 +1549,83 @@ + + + + + + + + + + + + + + + + + + + + + + Review comment: this is the first occurence of MEDIUMTEXT in package.jdo - I don't know how well that will work we had quite a few problems with "long" tableproperty values - and PARAM_VALUE was updated to use CLOB in oracle/etc the most important would be to make sure that we can store the procedure in all supported metastore databases - if possible this should also be tested in some way (at least by hand) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 493970) Time Spent: 0.5h (was: 20m) > HMS storage backend for HPL/SQL stored procedur
[jira] [Assigned] (HIVE-24226) Avoid Copy of Bytes in Protobuf BinaryWriter
[ https://issues.apache.org/jira/browse/HIVE-24226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor reassigned HIVE-24226: - > Avoid Copy of Bytes in Protobuf BinaryWriter > > > Key: HIVE-24226 > URL: https://issues.apache.org/jira/browse/HIVE-24226 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > > {code:java|title=ProtoWriteSupport.java} > class BinaryWriter extends FieldWriter { > @Override > final void writeRawValue(Object value) { > ByteString byteString = (ByteString) value; > Binary binary = Binary.fromConstantByteArray(byteString.toByteArray()); > recordConsumer.addBinary(binary); > } > } > {code} > {{toByteArray()}} creates a copy of the buffer. There is already support > with Parquet and Protobuf to pass instead a ByteBuffer which avoids the copy. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=493956&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493956 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 02/Oct/20 13:45 Start Date: 02/Oct/20 13:45 Worklog Time Spent: 10m Work Description: deniskuzZ commented on pull request #1415: URL: https://github.com/apache/hive/pull/1415#issuecomment-702742918 created same for master: https://github.com/apache/hive/pull/1548 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 493956) Time Spent: 40m (was: 0.5h) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 40m > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
[ https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=493954&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493954 ] ASF GitHub Bot logged work on HIVE-21052: - Author: ASF GitHub Bot Created on: 02/Oct/20 13:45 Start Date: 02/Oct/20 13:45 Worklog Time Spent: 10m Work Description: deniskuzZ opened a new pull request #1548: URL: https://github.com/apache/hive/pull/1548 ### What changes were proposed in this pull request? Below changes are only with respect to branch-3.1. Design: taken from https://issues.apache.org/jira/secure/attachment/12954375/Aborted%20Txn%20w_Direct%20Write.pdf **Overview:** 1. add a dummy row to TXN_COMPONENTS with operation type 'p' in enqueueLockWithRetry, which will be removed in addDynamicPartition 2. If anytime txn is aborted, this dummy entry will be block initiator to remove this txnId from TXNS 3. Initiator will add a row in COMPACTION_QUEUE (with type 'p') for the above aborted txn with the state as READY_FOR_CLEANING, at a time there will be a single entry of this type for a table in COMPACTION_QUEUE. 4. Cleaner will directly pickup above request, and process it via new cleanAborted code path(scan all partitions and remove aborted dirs), once successful cleaner will remove dummy row from TXN_COMPONENTS **Cleaner Design:** - We are keeping cleaner single thread, and this new type of cleanup will be handled similar to any regular cleanup **Aborted dirs cleanup:** - In p-type cleanup, cleaner will iterate over all the partitions and remove all delta/base dirs with given aborted writeId list - added cleanup of aborted base/delta in the worker also **TXN_COMPONENTS cleanup:** - If successful, p-type entry will be removed from TXN_COMPONENTS during addDynamicPartitions - If aborted, cleaner will clean in markCleaned after successful processing of p-type cleanup **TXNS cleanup:** - No change, will be cleaned up by the initiator ### Why are the changes needed? To fix above mentioned issue. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? unit-tests added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 493954) Time Spent: 0.5h (was: 20m) > Make sure transactions get cleaned if they are aborted before addPartitions > is called > - > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0, 3.1.1 >Reporter: Jaume M >Assignee: Jaume M >Priority: Critical > Labels: pull-request-available > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, > HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, > HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, > HIVE-21052.8.patch, HIVE-21052.9.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24157) Strict mode to fail on CAST timestamp <-> numeric
[ https://issues.apache.org/jira/browse/HIVE-24157?focusedWorklogId=493935&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493935 ] ASF GitHub Bot logged work on HIVE-24157: - Author: ASF GitHub Bot Created on: 02/Oct/20 13:02 Start Date: 02/Oct/20 13:02 Worklog Time Spent: 10m Work Description: kgyrtkirk merged pull request #1497: URL: https://github.com/apache/hive/pull/1497 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 493935) Time Spent: 1h 10m (was: 1h) > Strict mode to fail on CAST timestamp <-> numeric > - > > Key: HIVE-24157 > URL: https://issues.apache.org/jira/browse/HIVE-24157 > Project: Hive > Issue Type: Improvement > Components: SQL >Reporter: Jesus Camacho Rodriguez >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > There is some interest in enforcing that CAST numeric <\-> timestamp is > disallowed to avoid confusion among users, e.g., SQL standard does not allow > numeric <\-> timestamp casting, timestamp type is timezone agnostic, etc. > We should introduce a strict config for timestamp (similar to others before): > If the config is true, we shall fail while compiling the query with a > meaningful message. > To provide similar behavior, Hive has multiple functions that provide clearer > semantics for numeric to timestamp conversion (and vice versa): > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24157) Strict mode to fail on CAST timestamp <-> numeric
[ https://issues.apache.org/jira/browse/HIVE-24157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-24157. - Fix Version/s: 4.0.0 Resolution: Fixed merged into master. Thank you Jesus for reviewing the changes! > Strict mode to fail on CAST timestamp <-> numeric > - > > Key: HIVE-24157 > URL: https://issues.apache.org/jira/browse/HIVE-24157 > Project: Hive > Issue Type: Improvement > Components: SQL >Reporter: Jesus Camacho Rodriguez >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > There is some interest in enforcing that CAST numeric <\-> timestamp is > disallowed to avoid confusion among users, e.g., SQL standard does not allow > numeric <\-> timestamp casting, timestamp type is timezone agnostic, etc. > We should introduce a strict config for timestamp (similar to others before): > If the config is true, we shall fail while compiling the query with a > meaningful message. > To provide similar behavior, Hive has multiple functions that provide clearer > semantics for numeric to timestamp conversion (and vice versa): > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23976) Enable vectorization for multi-col semi join reducers
[ https://issues.apache.org/jira/browse/HIVE-23976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206145#comment-17206145 ] Stamatis Zampetakis commented on HIVE-23976: Hi [~abstractdog], While working on HIVE-24221, I got some further questions/ideas regarding this issue. It seems that we make use of n-ary vectorized expressions for the evaluation of AND and OR operators; its true it is not done with the descriptor but through {{VectorizationContext}}. I am not sure what this mean in terms of efficiency, but it looks like we are saving at least some memory since I get the impression that we can reuse the output vector and not have a different output vector per pair of binary operations. We could employ something similar for an n-ary hash function. Assuming that we cannot/should not treat the hash as n-ary operator then I think it makes more sense to make it unary (single input, single output), instead of binary, being only a kind of wrapper around Murmur for the different datatypes. By doing this the implementation will be simpler and we can cover more use-cases as the combine step is delegated to another abstraction. +Currently+ {noformat} hash(a,b) = 31*murmur(a) + murmur(b) {noformat} +After+ {noformat} hash(a) = murmur(a) {noformat} What do you think? > Enable vectorization for multi-col semi join reducers > - > > Key: HIVE-23976 > URL: https://issues.apache.org/jira/browse/HIVE-23976 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > HIVE-21196 introduces multi-column semi-join reducers in the query engine. > However, the implementation relies on GenericUDFMurmurHash which is not > vectorized thus the respective operators cannot be executed in vectorized > mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-24225) FIX S3A recordReader policy selection
[ https://issues.apache.org/jira/browse/HIVE-24225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-24225 started by Panagiotis Garefalakis. - > FIX S3A recordReader policy selection > - > > Key: HIVE-24225 > URL: https://issues.apache.org/jira/browse/HIVE-24225 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Dynamic S3A recordReader policy selection can cause issues on lazy > initialized FS objects -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24225) FIX S3A recordReader policy selection
[ https://issues.apache.org/jira/browse/HIVE-24225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24225: -- Labels: pull-request-available (was: ) > FIX S3A recordReader policy selection > - > > Key: HIVE-24225 > URL: https://issues.apache.org/jira/browse/HIVE-24225 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Dynamic S3A recordReader policy selection can cause issues on lazy > initialized FS objects -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24225) FIX S3A recordReader policy selection
[ https://issues.apache.org/jira/browse/HIVE-24225?focusedWorklogId=493916&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493916 ] ASF GitHub Bot logged work on HIVE-24225: - Author: ASF GitHub Bot Created on: 02/Oct/20 11:44 Start Date: 02/Oct/20 11:44 Worklog Time Spent: 10m Work Description: pgaref opened a new pull request #1547: URL: https://github.com/apache/hive/pull/1547 This reverts commit c87e60d4 Change-Id: Ie8b783e0b1e0e32d9a54f6663e9aae5dd0a0f94f ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 493916) Remaining Estimate: 0h Time Spent: 10m > FIX S3A recordReader policy selection > - > > Key: HIVE-24225 > URL: https://issues.apache.org/jira/browse/HIVE-24225 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Dynamic S3A recordReader policy selection can cause issues on lazy > initialized FS objects -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24225) FIX S3A recordReader policy selection
[ https://issues.apache.org/jira/browse/HIVE-24225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis reassigned HIVE-24225: - > FIX S3A recordReader policy selection > - > > Key: HIVE-24225 > URL: https://issues.apache.org/jira/browse/HIVE-24225 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > > Dynamic S3A recordReader policy selection can cause issues on lazy > initialized FS objects -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24224) Fix skipping header/footer for Hive on Tez on compressed files
[ https://issues.apache.org/jira/browse/HIVE-24224?focusedWorklogId=493913&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493913 ] ASF GitHub Bot logged work on HIVE-24224: - Author: ASF GitHub Bot Created on: 02/Oct/20 11:34 Start Date: 02/Oct/20 11:34 Worklog Time Spent: 10m Work Description: pgaref commented on pull request #1546: URL: https://github.com/apache/hive/pull/1546#issuecomment-702682798 @abstractdog @mustafaiman can you please take a look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 493913) Time Spent: 20m (was: 10m) > Fix skipping header/footer for Hive on Tez on compressed files > -- > > Key: HIVE-24224 > URL: https://issues.apache.org/jira/browse/HIVE-24224 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Compressed file with Hive on Tez returns header and footers - for both > select * and select count ( * ): > {noformat} > printf "offset,id,other\n9,\"20200315 X00 1356\",123\n17,\"20200315 X00 > 1357\",123\nrst,rst,rst" > data.csv > hdfs dfs -put -f data.csv /apps/hive/warehouse/bz2test/bz2tbl1/ > bzip2 -f data.csv > hdfs dfs -put -f data.csv.bz2 /apps/hive/warehouse/bz2test/bz2tbl2/ > beeline -e "CREATE EXTERNAL TABLE default.bz2tst2 ( > sequence int, > id string, > other string) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' > LOCATION '/apps/hive/warehouse/bz2test/bz2tbl2' > TBLPROPERTIES ( > 'skip.header.line.count'='1', > 'skip.footer.line.count'='1');" > beeline -e " > SET hive.fetch.task.conversion = none; > SELECT * FROM default.bz2tst2;" > +---+++ > | bz2tst2.sequence | bz2tst2.id | bz2tst2.other | > +---+++ > | offset| id | other | > | 9 | 20200315 X00 1356 | 123| > | 17| 20200315 X00 1357 | 123| > | rst | rst| rst| > +---+++ > {noformat} > PS: HIVE-22769 addressed the issue for Hive on LLAP. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24224) Fix skipping header/footer for Hive on Tez on compressed files
[ https://issues.apache.org/jira/browse/HIVE-24224?focusedWorklogId=493912&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493912 ] ASF GitHub Bot logged work on HIVE-24224: - Author: ASF GitHub Bot Created on: 02/Oct/20 11:31 Start Date: 02/Oct/20 11:31 Worklog Time Spent: 10m Work Description: pgaref opened a new pull request #1546: URL: https://github.com/apache/hive/pull/1546 Change-Id: I918a2eff0197e7d92db1f1858f3402b874d3b10a ### What changes were proposed in this pull request? Fix header/footer skipping for compressed files -- bug discovered for Hive on Tez ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 493912) Remaining Estimate: 0h Time Spent: 10m > Fix skipping header/footer for Hive on Tez on compressed files > -- > > Key: HIVE-24224 > URL: https://issues.apache.org/jira/browse/HIVE-24224 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Compressed file with Hive on Tez returns header and footers - for both > select * and select count ( * ): > {noformat} > printf "offset,id,other\n9,\"20200315 X00 1356\",123\n17,\"20200315 X00 > 1357\",123\nrst,rst,rst" > data.csv > hdfs dfs -put -f data.csv /apps/hive/warehouse/bz2test/bz2tbl1/ > bzip2 -f data.csv > hdfs dfs -put -f data.csv.bz2 /apps/hive/warehouse/bz2test/bz2tbl2/ > beeline -e "CREATE EXTERNAL TABLE default.bz2tst2 ( > sequence int, > id string, > other string) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' > LOCATION '/apps/hive/warehouse/bz2test/bz2tbl2' > TBLPROPERTIES ( > 'skip.header.line.count'='1', > 'skip.footer.line.count'='1');" > beeline -e " > SET hive.fetch.task.conversion = none; > SELECT * FROM default.bz2tst2;" > +---+++ > | bz2tst2.sequence | bz2tst2.id | bz2tst2.other | > +---+++ > | offset| id | other | > | 9 | 20200315 X00 1356 | 123| > | 17| 20200315 X00 1357 | 123| > | rst | rst| rst| > +---+++ > {noformat} > PS: HIVE-22769 addressed the issue for Hive on LLAP. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24224) Fix skipping header/footer for Hive on Tez on compressed files
[ https://issues.apache.org/jira/browse/HIVE-24224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24224: -- Labels: pull-request-available (was: ) > Fix skipping header/footer for Hive on Tez on compressed files > -- > > Key: HIVE-24224 > URL: https://issues.apache.org/jira/browse/HIVE-24224 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Compressed file with Hive on Tez returns header and footers - for both > select * and select count ( * ): > {noformat} > printf "offset,id,other\n9,\"20200315 X00 1356\",123\n17,\"20200315 X00 > 1357\",123\nrst,rst,rst" > data.csv > hdfs dfs -put -f data.csv /apps/hive/warehouse/bz2test/bz2tbl1/ > bzip2 -f data.csv > hdfs dfs -put -f data.csv.bz2 /apps/hive/warehouse/bz2test/bz2tbl2/ > beeline -e "CREATE EXTERNAL TABLE default.bz2tst2 ( > sequence int, > id string, > other string) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' > LOCATION '/apps/hive/warehouse/bz2test/bz2tbl2' > TBLPROPERTIES ( > 'skip.header.line.count'='1', > 'skip.footer.line.count'='1');" > beeline -e " > SET hive.fetch.task.conversion = none; > SELECT * FROM default.bz2tst2;" > +---+++ > | bz2tst2.sequence | bz2tst2.id | bz2tst2.other | > +---+++ > | offset| id | other | > | 9 | 20200315 X00 1356 | 123| > | 17| 20200315 X00 1357 | 123| > | rst | rst| rst| > +---+++ > {noformat} > PS: HIVE-22769 addressed the issue for Hive on LLAP. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24224) Fix skipping header/footer for Hive on Tez on compressed files
[ https://issues.apache.org/jira/browse/HIVE-24224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis reassigned HIVE-24224: - > Fix skipping header/footer for Hive on Tez on compressed files > -- > > Key: HIVE-24224 > URL: https://issues.apache.org/jira/browse/HIVE-24224 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > > Compressed file with Hive on Tez returns header and footers - for both > select * and select count ( * ): > {noformat} > printf "offset,id,other\n9,\"20200315 X00 1356\",123\n17,\"20200315 X00 > 1357\",123\nrst,rst,rst" > data.csv > hdfs dfs -put -f data.csv /apps/hive/warehouse/bz2test/bz2tbl1/ > bzip2 -f data.csv > hdfs dfs -put -f data.csv.bz2 /apps/hive/warehouse/bz2test/bz2tbl2/ > beeline -e "CREATE EXTERNAL TABLE default.bz2tst2 ( > sequence int, > id string, > other string) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' > LOCATION '/apps/hive/warehouse/bz2test/bz2tbl2' > TBLPROPERTIES ( > 'skip.header.line.count'='1', > 'skip.footer.line.count'='1');" > beeline -e " > SET hive.fetch.task.conversion = none; > SELECT * FROM default.bz2tst2;" > +---+++ > | bz2tst2.sequence | bz2tst2.id | bz2tst2.other | > +---+++ > | offset| id | other | > | 9 | 20200315 X00 1356 | 123| > | 17| 20200315 X00 1357 | 123| > | rst | rst| rst| > +---+++ > {noformat} > PS: HIVE-22769 addressed the issue for Hive on LLAP. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
[ https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=493896&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493896 ] ASF GitHub Bot logged work on HIVE-22826: - Author: ASF GitHub Bot Created on: 02/Oct/20 11:09 Start Date: 02/Oct/20 11:09 Worklog Time Spent: 10m Work Description: sankarh commented on a change in pull request #1528: URL: https://github.com/apache/hive/pull/1528#discussion_r498651478 ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/table/column/change/AlterTableChangeColumnOperation.java ## @@ -72,6 +74,10 @@ protected void doAlteration(Table table, Partition partition) throws HiveExcepti if (desc.getNewColumnComment() != null) { oldColumn.setComment(desc.getNewColumnComment()); } +if (CollectionUtils.isNotEmpty(sd.getBucketCols()) && sd.getBucketCols().contains(oldColumnName)) { + sd.getBucketCols().remove(oldColumnName); + sd.getBucketCols().add(desc.getNewColumnName()); Review comment: Should we store it in lower-case? ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/table/column/change/AlterTableChangeColumnOperation.java ## @@ -72,6 +74,10 @@ protected void doAlteration(Table table, Partition partition) throws HiveExcepti if (desc.getNewColumnComment() != null) { oldColumn.setComment(desc.getNewColumnComment()); } +if (CollectionUtils.isNotEmpty(sd.getBucketCols()) && sd.getBucketCols().contains(oldColumnName)) { Review comment: Is oldColumnName case in-sensitive? ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ## @@ -130,6 +130,11 @@ public void alterTable(RawStore msdb, Warehouse wh, String catName, String dbnam throw new InvalidOperationException("Invalid column " + validate); } +// Validate bucketedColumns in new table +if (!MetaStoreServerUtils.validateBucketColumns(newt.getSd())) { + throw new InvalidOperationException("Bucket column doesn't match with any table columns"); Review comment: Useful to add an error log with the column name which is missing from table columns list. ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java ## @@ -1554,4 +1555,32 @@ public static Partition createMetaPartitionObject(Table tbl, Map } return tpart; } + + /** + * Validate bucket columns should belong to table columns. + * @param sd StorageDescriptor of given table + * @return true if bucket columns are empty or belong to table columns else false + */ + public static boolean validateBucketColumns(StorageDescriptor sd) { +List columnNames = getColumnNames(sd.getCols()); +if(CollectionUtils.isNotEmpty(sd.getBucketCols()) && CollectionUtils.isNotEmpty(columnNames)){ + return columnNames.containsAll(sd.getBucketCols().stream().map(String::toLowerCase).collect(Collectors.toList())); +} else if (CollectionUtils.isNotEmpty(sd.getBucketCols()) && CollectionUtils.isEmpty(columnNames)) { + return false; +} else { + return true; +} + } + + /** + * Generate column name list from the fieldSchema list + * @param cols fieldSchema list + * @return column name list + */ + public static List getColumnNames(List cols) { +if (CollectionUtils.isNotEmpty(cols)) { + return cols.stream().map(FieldSchema::getName).collect(Collectors.toList()); Review comment: Will cols always have names in lower case? ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java ## @@ -1554,4 +1555,32 @@ public static Partition createMetaPartitionObject(Table tbl, Map } return tpart; } + + /** + * Validate bucket columns should belong to table columns. + * @param sd StorageDescriptor of given table + * @return true if bucket columns are empty or belong to table columns else false + */ + public static boolean validateBucketColumns(StorageDescriptor sd) { +List columnNames = getColumnNames(sd.getCols()); +if(CollectionUtils.isNotEmpty(sd.getBucketCols()) && CollectionUtils.isNotEmpty(columnNames)){ Review comment: nit: Add space before "(" ## File path: ql/src/test/queries/clientpositive/alter_numbuckets_partitioned_table_h23.q ## @@ -52,6 +52,12 @@ alter table tst1_n1 clustered by (value) into 12 buckets; describe formatted tst1_n1; +-- Test changing name of bucket column + +alter table tst1_n1 change key keys string; + +describe formatted tst1_n1; Review comment: Also check the output of "show create table" command. ## File path: ql/src/test/queries/c
[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
[ https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=493897&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493897 ] ASF GitHub Bot logged work on HIVE-22826: - Author: ASF GitHub Bot Created on: 02/Oct/20 11:09 Start Date: 02/Oct/20 11:09 Worklog Time Spent: 10m Work Description: sankarh commented on a change in pull request #1528: URL: https://github.com/apache/hive/pull/1528#discussion_r498652268 ## File path: ql/src/test/queries/clientpositive/alter_bucketedtable_change_column.q ## @@ -0,0 +1,10 @@ +--! qt:dataset:src +create table alter_bucket_change_col_t1(key string, value string) partitioned by (ds string) clustered by (key) into 10 buckets; + +describe formatted alter_bucket_change_col_t1; + +-- Test changing name of bucket column + +alter table alter_bucket_change_col_t1 change key keys string; Review comment: Add test for column names with mix of upper and lower case letters. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 493897) Time Spent: 1h 20m (was: 1h 10m) > ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names > --- > > Key: HIVE-22826 > URL: https://issues.apache.org/jira/browse/HIVE-22826 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Karen Coppage >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Attachments: unitTest.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Compaction for tables where a bucketed column has been renamed fails since > the list of bucketed columns in the StorageDescriptor doesn't get updated > when the column is renamed, therefore we can't recreate the table correctly > during compaction. > Attached a unit test that fails. > NO PRECOMMIT TESTS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24223) Insert into Hive tables doesn't work for decimal numbers with no preceding digit before the decimal
[ https://issues.apache.org/jira/browse/HIVE-24223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kriti Jha updated HIVE-24223: - Description: Any insert operation to a table in Hive with decimal integers without a digit before the DOT ('.') fails with an exception as shown below: -- hive> create table test_dec(id decimal(10,8)); hive> insert into test_dec values (-.5); NoViableAltException(16@[412:1: atomExpression : ( constant | ( intervalExpression )=> intervalExpression | castExpression | extractExpression | floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | expressionsInParenthesis[true] );])NoViableAltException(16@[412:1: atomExpression : ( constant | ( intervalExpression )=> intervalExpression | castExpression | extractExpression | floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | expressionsInParenthesis[true] );]) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA36.specialStateTransition(HiveParser_IdentifiersParser.java:31810) at org.antlr.runtime.DFA.predict(DFA.java:80) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6746) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceFieldExpression(HiveParser_IdentifiersParser.java:6988) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnaryPrefixExpression(HiveParser_IdentifiersParser.java:7324) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnarySuffixExpression(HiveParser_IdentifiersParser.java:7380) at ... org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expression(HiveParser_IdentifiersParser.java:6686) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsNotInParenthesis(HiveParser_IdentifiersParser.java:2287) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsInParenthesis(HiveParser_IdentifiersParser.java:2233) at org.apache.hadoop.hive.ql.parse.HiveParser.expressionsInParenthesis(HiveParser.java:42106) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valueRowConstructor(HiveParser_FromClauseParser.java:6499) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesTableConstructor(HiveParser_FromClauseParser.java:6583) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesClause(HiveParser_FromClauseParser.java:6704) at org.apache.hadoop.hive.ql.parse.HiveParser.valuesClause(HiveParser.java:41954) at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36536) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35822) at ... org.apache.hadoop.util.RunJar.main(RunJar.java:153)FAILED: ParseException line 1:30 cannot recognize input near '.' '5' ')' in expression specification -- It seems to be coming from the Lexer where the types are defined and the definition of 'Number' should be coming into play: -- Number : (Digit)+ ( DOT (Digit)* (Exponent)? | Exponent)? ; -- >[https://github.com/apache/hive/blob/2006e52713508a92fb4d1d28262fd7175eade8b7/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexerParent.g#L469] However, the below works: > insert into test_dec values ('-.5'); > insert into test_dec values (-0.5); was: Any insert operation to a table in Hive with decimal integers without a digit before the DOT ('.') fails with an exception as shown below: -- hive> create table test_dec(id decimal(10,8)); hive> insert into test_dec values (-.5); NoViableAltException(16@[412:1: atomExpression : ( constant | ( intervalExpression )=> intervalExpression | castExpression | extractExpression | floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | expressionsInParenthesis[true] );])NoViableAltException(16@[412:1: atomExpression : ( constant | ( intervalExpression )=> intervalExpression | castExpression | extractExpression | floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | expressionsInParenthesis[true] );]) at or
[jira] [Updated] (HIVE-24223) Insert into Hive tables doesn't work for decimal numbers with no preceding digit before the decimal
[ https://issues.apache.org/jira/browse/HIVE-24223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kriti Jha updated HIVE-24223: - Description: Any insert operation to a table in Hive with decimal integers without a digit before the DOT ('.') fails with an exception as shown below: -- hive> create table test_dec(id decimal(10,8)); hive> insert into test_dec values (-.5); NoViableAltException(16@[412:1: atomExpression : ( constant | ( intervalExpression )=> intervalExpression | castExpression | extractExpression | floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | expressionsInParenthesis[true] );])NoViableAltException(16@[412:1: atomExpression : ( constant | ( intervalExpression )=> intervalExpression | castExpression | extractExpression | floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | expressionsInParenthesis[true] );]) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA36.specialStateTransition(HiveParser_IdentifiersParser.java:31810) at org.antlr.runtime.DFA.predict(DFA.java:80) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6746) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceFieldExpression(HiveParser_IdentifiersParser.java:6988) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnaryPrefixExpression(HiveParser_IdentifiersParser.java:7324) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnarySuffixExpression(HiveParser_IdentifiersParser.java:7380) at ... org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expression(HiveParser_IdentifiersParser.java:6686) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsNotInParenthesis(HiveParser_IdentifiersParser.java:2287) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsInParenthesis(HiveParser_IdentifiersParser.java:2233) at org.apache.hadoop.hive.ql.parse.HiveParser.expressionsInParenthesis(HiveParser.java:42106) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valueRowConstructor(HiveParser_FromClauseParser.java:6499) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesTableConstructor(HiveParser_FromClauseParser.java:6583) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesClause(HiveParser_FromClauseParser.java:6704) at org.apache.hadoop.hive.ql.parse.HiveParser.valuesClause(HiveParser.java:41954) at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36536) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35822) at ... org.apache.hadoop.util.RunJar.main(RunJar.java:153)FAILED: ParseException line 1:30 cannot recognize input near '.' '5' ')' in expression specification -- It seems to be coming from the Lexer where the types are defined and the definition of 'Number' should be coming into play: -- Number : (Digit)+ ( DOT (Digit)* (Exponent)? | Exponent)? ; -- >[https://github.com/apache/hive/blob/2006e52713508a92fb4d1d28262fd7175eade8b7/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexerParent.g#L469] However, the below works: -> insert into test_dec values ('-.5'); -> insert into test_dec values (-0.5); was: Any insert operation to a table in Hive with decimal integers without a digit before the DOT ('.') fails with an exception as shown below: -- hive> create table test_dec(id decimal(10,8)); hive> insert into test_dec values (-.5); NoViableAltException(16@[412:1: atomExpression : ( constant | ( intervalExpression )=> intervalExpression | castExpression | extractExpression | floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | expressionsInParenthesis[true] );])NoViableAltException(16@[412:1: atomExpression : ( constant | ( intervalExpression )=> intervalExpression | castExpression | extractExpression | floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | expressionsInParenthesis[true] );]) at org.apache.hadoop
[jira] [Updated] (HIVE-24223) Insert into Hive tables doesn't work for decimal numbers with no preceding digit before the decimal
[ https://issues.apache.org/jira/browse/HIVE-24223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kriti Jha updated HIVE-24223: - Description: Any insert operation to a table in Hive with decimal integers without a digit before the DOT ('.') fails with an exception as shown below: -- hive> create table test_dec(id decimal(10,8)); hive> insert into test_dec values (-.5); NoViableAltException(16@[412:1: atomExpression : ( constant | ( intervalExpression )=> intervalExpression | castExpression | extractExpression | floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | expressionsInParenthesis[true] );])NoViableAltException(16@[412:1: atomExpression : ( constant | ( intervalExpression )=> intervalExpression | castExpression | extractExpression | floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | expressionsInParenthesis[true] );]) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA36.specialStateTransition(HiveParser_IdentifiersParser.java:31810) at org.antlr.runtime.DFA.predict(DFA.java:80) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6746) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceFieldExpression(HiveParser_IdentifiersParser.java:6988) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnaryPrefixExpression(HiveParser_IdentifiersParser.java:7324) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnarySuffixExpression(HiveParser_IdentifiersParser.java:7380) at ... org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expression(HiveParser_IdentifiersParser.java:6686) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsNotInParenthesis(HiveParser_IdentifiersParser.java:2287) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsInParenthesis(HiveParser_IdentifiersParser.java:2233) at org.apache.hadoop.hive.ql.parse.HiveParser.expressionsInParenthesis(HiveParser.java:42106) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valueRowConstructor(HiveParser_FromClauseParser.java:6499) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesTableConstructor(HiveParser_FromClauseParser.java:6583) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesClause(HiveParser_FromClauseParser.java:6704) at org.apache.hadoop.hive.ql.parse.HiveParser.valuesClause(HiveParser.java:41954) at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36536) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35822) at ... org.apache.hadoop.util.RunJar.main(RunJar.java:153)FAILED: ParseException line 1:30 cannot recognize input near '.' '5' ')' in expression specification -- It seems to be coming from the Lexer where the types are defined and the definition of 'Number' should be coming into play: -- Number : (Digit)+ ( DOT (Digit)* (Exponent)? | Exponent)? ; -- >[https://github.com/apache/hive/blob/2006e52713508a92fb4d1d28262fd7175eade8b7/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexerParent.g#L469] However, the below works: > insert into test_dec values ('-.5'); > insert into test_dec values (-0.5); was: Any insert operation to a table in Hive with decimal integers without a digit before the DOT ('.') fails with an exception as shown below: -- hive> create table test_dec(id decimal(10,8)); hive> insert into test_dec values (-.5); NoViableAltException(16@[412:1: atomExpression : ( constant | ( intervalExpression )=> intervalExpression | castExpression | extractExpression | floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | expressionsInParenthesis[true] );])NoViableAltException(16@[412:1: atomExpression : ( constant | ( intervalExpression )=> intervalExpression | castExpression | extractExpression | floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | expressionsInParenthesis[true] );]) at org.apache.hadoop.h
[jira] [Updated] (HIVE-24223) Insert into Hive tables doesn't work for decimal numbers with no preceding digit before the decimal
[ https://issues.apache.org/jira/browse/HIVE-24223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kriti Jha updated HIVE-24223: - Description: Any insert operation to a table in Hive with decimal integers without a digit before the DOT ('.') fails with an exception as shown below: -- hive> create table test_dec(id decimal(10,8)); hive> insert into test_dec values (-.5); NoViableAltException(16@[412:1: atomExpression : ( constant | ( intervalExpression )=> intervalExpression | castExpression | extractExpression | floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | expressionsInParenthesis[true] );])NoViableAltException(16@[412:1: atomExpression : ( constant | ( intervalExpression )=> intervalExpression | castExpression | extractExpression | floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | expressionsInParenthesis[true] );]) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA36.specialStateTransition(HiveParser_IdentifiersParser.java:31810) at org.antlr.runtime.DFA.predict(DFA.java:80) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6746) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceFieldExpression(HiveParser_IdentifiersParser.java:6988) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnaryPrefixExpression(HiveParser_IdentifiersParser.java:7324) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnarySuffixExpression(HiveParser_IdentifiersParser.java:7380) at ... org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expression(HiveParser_IdentifiersParser.java:6686) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsNotInParenthesis(HiveParser_IdentifiersParser.java:2287) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsInParenthesis(HiveParser_IdentifiersParser.java:2233) at org.apache.hadoop.hive.ql.parse.HiveParser.expressionsInParenthesis(HiveParser.java:42106) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valueRowConstructor(HiveParser_FromClauseParser.java:6499) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesTableConstructor(HiveParser_FromClauseParser.java:6583) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesClause(HiveParser_FromClauseParser.java:6704) at org.apache.hadoop.hive.ql.parse.HiveParser.valuesClause(HiveParser.java:41954) at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36536) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35822) at ... org.apache.hadoop.util.RunJar.main(RunJar.java:153)FAILED: ParseException line 1:30 cannot recognize input near '.' '5' ')' in expression specification -- It seems to be coming from the Lexer where the types are defined and the definition of 'Number' should be coming into play: -- Number : (Digit)+ ( DOT (Digit)* (Exponent)? | Exponent)? ; -- >[https://github.com/apache/hive/blob/2006e52713508a92fb4d1d28262fd7175eade8b7/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexerParent.g#L469] However, the below works: --> insert into test_dec values ('-.5'); --> insert into test_dec values (-0.5); was: Any insert operation to a table in Hive with decimal integers without a digit before the DOT ('.') fails with an exception as shown below: -- hive> create table test_dec(id decimal(10,8)); hive> insert into test_dec values (-.5); NoViableAltException(16@[412:1: atomExpression : ( constant | ( intervalExpression )=> intervalExpression | castExpression | extractExpression | floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | expressionsInParenthesis[true] );])NoViableAltException(16@[412:1: atomExpression : ( constant | ( intervalExpression )=> intervalExpression | castExpression | extractExpression | floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | expressionsInParenthesis[true] );]) at org.apache.hado
[jira] [Commented] (HIVE-24221) Use vectorizable expression to combine multiple columns in semijoin bloom filters
[ https://issues.apache.org/jira/browse/HIVE-24221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206086#comment-17206086 ] Stamatis Zampetakis commented on HIVE-24221: There are various ways to create a hash from composite keys/columns. Without any special effort to derive a perfect hash function we can do the following: Input columns: A, B, C, D +Option A:+ {noformat} hash(hash(hash(A, B), C), D) {noformat} +Option B:+ {noformat} 31*(31*(31 * hash(A) + hash(B)) + hash(C)) + hash(D) {noformat} The second option is more or less what happens currently when we write hash(A, B, C, D) in the non-vectorized implementation of GenericUDFMurmurHash. The first option although it looks simpler is computationally more expensive. > Use vectorizable expression to combine multiple columns in semijoin bloom > filters > - > > Key: HIVE-24221 > URL: https://issues.apache.org/jira/browse/HIVE-24221 > Project: Hive > Issue Type: Improvement > Components: Query Planning > Environment: >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, multi-column semijoin reducers use an n-ary call to > GenericUDFMurmurHash to combine multiple values into one, which is used as an > entry to the bloom filter. However, there are no vectorized operators that > treat n-ary inputs. The same goes for the vectorized implementation of > GenericUDFMurmurHash introduced in HIVE-23976. > The goal of this issue is to choose an alternative way to combine multiple > values into one to pass in the bloom filter comprising only vectorized > operators. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-19253) HMS ignores tableType property for external tables
[ https://issues.apache.org/jira/browse/HIVE-19253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206064#comment-17206064 ] Szehon Ho commented on HIVE-19253: -- i think there's no new test failures [~vihangk1] what do you think about it? > HMS ignores tableType property for external tables > -- > > Key: HIVE-19253 > URL: https://issues.apache.org/jira/browse/HIVE-19253 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.0.0, 3.1.0, 4.0.0 >Reporter: Alex Kolbasov >Assignee: Vihang Karajgaonkar >Priority: Major > Labels: newbie, pull-request-available > Attachments: HIVE-19253.01.patch, HIVE-19253.02.patch, > HIVE-19253.03.patch, HIVE-19253.03.patch, HIVE-19253.04.patch, > HIVE-19253.05.patch, HIVE-19253.06.patch, HIVE-19253.07.patch, > HIVE-19253.08.patch, HIVE-19253.09.patch, HIVE-19253.10.patch, > HIVE-19253.11.patch, HIVE-19253.12.patch > > Time Spent: 10m > Remaining Estimate: 0h > > When someone creates a table using Thrift API they may think that setting > tableType to {{EXTERNAL_TABLE}} creates an external table. And boom - their > table is gone later because HMS will silently change it to managed table. > here is the offending code: > {code:java} > private MTable convertToMTable(Table tbl) throws InvalidObjectException, > MetaException { > ... > // If the table has property EXTERNAL set, update table type > // accordingly > String tableType = tbl.getTableType(); > boolean isExternal = > Boolean.parseBoolean(tbl.getParameters().get("EXTERNAL")); > if (TableType.MANAGED_TABLE.toString().equals(tableType)) { > if (isExternal) { > tableType = TableType.EXTERNAL_TABLE.toString(); > } > } > if (TableType.EXTERNAL_TABLE.toString().equals(tableType)) { > if (!isExternal) { // Here! > tableType = TableType.MANAGED_TABLE.toString(); > } > } > {code} > So if the EXTERNAL parameter is not set, table type is changed to managed > even if it was external in the first place - which is wrong. > More over, in other places code looks at the table property to decide table > type and some places look at parameter. HMS should really make its mind which > one to use. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24213) Incorrect exception in the Merge MapJoinTask into its child MapRedTask optimizer
[ https://issues.apache.org/jira/browse/HIVE-24213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marta Kuczora updated HIVE-24213: - Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) Pushed to master. Thanks a lot [~zmatyus] for the patch and [~kgyrtkirk] for the review! > Incorrect exception in the Merge MapJoinTask into its child MapRedTask > optimizer > > > Key: HIVE-24213 > URL: https://issues.apache.org/jira/browse/HIVE-24213 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 4.0.0 >Reporter: Zoltan Matyus >Assignee: Zoltan Matyus >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The {{CommonJoinTaskDispatcher#mergeMapJoinTaskIntoItsChildMapRedTask}} > method throws a {{SemanticException}} if the number of {{FileSinkOperator}}s > it finds is not exactly 1. The exception is valid if zero operators are > found, but there can be valid use cases where multiple FileSinkOperators > exist. > Example: the MapJoin and it child are used in a common table expression, > which is used for multiple inserts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24213) Incorrect exception in the Merge MapJoinTask into its child MapRedTask optimizer
[ https://issues.apache.org/jira/browse/HIVE-24213?focusedWorklogId=493867&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493867 ] ASF GitHub Bot logged work on HIVE-24213: - Author: ASF GitHub Bot Created on: 02/Oct/20 09:01 Start Date: 02/Oct/20 09:01 Worklog Time Spent: 10m Work Description: kuczoram merged pull request #1539: URL: https://github.com/apache/hive/pull/1539 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 493867) Time Spent: 20m (was: 10m) > Incorrect exception in the Merge MapJoinTask into its child MapRedTask > optimizer > > > Key: HIVE-24213 > URL: https://issues.apache.org/jira/browse/HIVE-24213 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 4.0.0 >Reporter: Zoltan Matyus >Assignee: Zoltan Matyus >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The {{CommonJoinTaskDispatcher#mergeMapJoinTaskIntoItsChildMapRedTask}} > method throws a {{SemanticException}} if the number of {{FileSinkOperator}}s > it finds is not exactly 1. The exception is valid if zero operators are > found, but there can be valid use cases where multiple FileSinkOperators > exist. > Example: the MapJoin and it child are used in a common table expression, > which is used for multiple inserts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24222) Upgrade ORC to 1.5.12
[ https://issues.apache.org/jira/browse/HIVE-24222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated HIVE-24222: -- Issue Type: Improvement (was: Bug) > Upgrade ORC to 1.5.12 > - > > Key: HIVE-24222 > URL: https://issues.apache.org/jira/browse/HIVE-24222 > Project: Hive > Issue Type: Improvement > Components: ORC >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24222) Upgrade ORC to 1.5.12
[ https://issues.apache.org/jira/browse/HIVE-24222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis reassigned HIVE-24222: - Assignee: Dongjoon Hyun > Upgrade ORC to 1.5.12 > - > > Key: HIVE-24222 > URL: https://issues.apache.org/jira/browse/HIVE-24222 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-21375) Closing TransactionBatch closes FileSystem for other batches in Hive streaming v1
[ https://issues.apache.org/jira/browse/HIVE-21375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ádám Szita resolved HIVE-21375. --- Fix Version/s: 3.2.0 Resolution: Fixed Committed to branch-3. Thanks Peter for reviewing. > Closing TransactionBatch closes FileSystem for other batches in Hive > streaming v1 > - > > Key: HIVE-21375 > URL: https://issues.apache.org/jira/browse/HIVE-21375 > Project: Hive > Issue Type: Bug > Components: HCatalog, Streaming >Affects Versions: 3.2.0 >Reporter: Shawn Weeks >Assignee: Ádám Szita >Priority: Minor > Labels: pull-request-available > Fix For: 3.2.0 > > Time Spent: 1h > Remaining Estimate: 0h > > The patch in HIVE-13151 added FileSystem.closeAllForUGI(ugi); to the close > method of HiveEndPoint for the legacy Streaming API. This seems to have a > side effect of closing the FileSystem for all open TransactionBatches as used > by NiFi and Storm when writing to multiple partitions. Setting > fs.hdfs.impl.disable.cache=true negates the issue but at a performance cost. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-21375) Closing TransactionBatch closes FileSystem for other batches in Hive streaming v1
[ https://issues.apache.org/jira/browse/HIVE-21375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ádám Szita updated HIVE-21375: -- Affects Version/s: (was: 3.2.0) 3.1.3 > Closing TransactionBatch closes FileSystem for other batches in Hive > streaming v1 > - > > Key: HIVE-21375 > URL: https://issues.apache.org/jira/browse/HIVE-21375 > Project: Hive > Issue Type: Bug > Components: HCatalog, Streaming >Affects Versions: 3.1.3 >Reporter: Shawn Weeks >Assignee: Ádám Szita >Priority: Minor > Labels: pull-request-available > Fix For: 3.2.0 > > Time Spent: 1h > Remaining Estimate: 0h > > The patch in HIVE-13151 added FileSystem.closeAllForUGI(ugi); to the close > method of HiveEndPoint for the legacy Streaming API. This seems to have a > side effect of closing the FileSystem for all open TransactionBatches as used > by NiFi and Storm when writing to multiple partitions. Setting > fs.hdfs.impl.disable.cache=true negates the issue but at a performance cost. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21375) Closing TransactionBatch closes FileSystem for other batches in Hive streaming v1
[ https://issues.apache.org/jira/browse/HIVE-21375?focusedWorklogId=493861&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493861 ] ASF GitHub Bot logged work on HIVE-21375: - Author: ASF GitHub Bot Created on: 02/Oct/20 08:19 Start Date: 02/Oct/20 08:19 Worklog Time Spent: 10m Work Description: szlta merged pull request #1541: URL: https://github.com/apache/hive/pull/1541 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 493861) Time Spent: 1h (was: 50m) > Closing TransactionBatch closes FileSystem for other batches in Hive > streaming v1 > - > > Key: HIVE-21375 > URL: https://issues.apache.org/jira/browse/HIVE-21375 > Project: Hive > Issue Type: Bug > Components: HCatalog, Streaming >Affects Versions: 3.2.0 >Reporter: Shawn Weeks >Assignee: Ádám Szita >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > The patch in HIVE-13151 added FileSystem.closeAllForUGI(ugi); to the close > method of HiveEndPoint for the legacy Streaming API. This seems to have a > side effect of closing the FileSystem for all open TransactionBatches as used > by NiFi and Storm when writing to multiple partitions. Setting > fs.hdfs.impl.disable.cache=true negates the issue but at a performance cost. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21375) Closing TransactionBatch closes FileSystem for other batches in Hive streaming v1
[ https://issues.apache.org/jira/browse/HIVE-21375?focusedWorklogId=493860&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493860 ] ASF GitHub Bot logged work on HIVE-21375: - Author: ASF GitHub Bot Created on: 02/Oct/20 08:12 Start Date: 02/Oct/20 08:12 Worklog Time Spent: 10m Work Description: szlta commented on pull request #1541: URL: https://github.com/apache/hive/pull/1541#issuecomment-702591651 Cherry-picked changes here that allow test run on this PR. There were many test failures as branch-3 is in bad shape currently, but none of those seemed to be related to hcatalog streaming, thus I'm moving forward with this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 493860) Time Spent: 50m (was: 40m) > Closing TransactionBatch closes FileSystem for other batches in Hive > streaming v1 > - > > Key: HIVE-21375 > URL: https://issues.apache.org/jira/browse/HIVE-21375 > Project: Hive > Issue Type: Bug > Components: HCatalog, Streaming >Affects Versions: 3.2.0 >Reporter: Shawn Weeks >Assignee: Ádám Szita >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > The patch in HIVE-13151 added FileSystem.closeAllForUGI(ugi); to the close > method of HiveEndPoint for the legacy Streaming API. This seems to have a > side effect of closing the FileSystem for all open TransactionBatches as used > by NiFi and Storm when writing to multiple partitions. Setting > fs.hdfs.impl.disable.cache=true negates the issue but at a performance cost. -- This message was sent by Atlassian Jira (v8.3.4#803005)