[jira] [Resolved] (HIVE-26274) No vectorization if query has upper case window function
[ https://issues.apache.org/jira/browse/HIVE-26274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa resolved HIVE-26274. --- Resolution: Fixed Pushed to master. Thanks [~abstractdog] for review. > No vectorization if query has upper case window function > > > Key: HIVE-26274 > URL: https://issues.apache.org/jira/browse/HIVE-26274 > Project: Hive > Issue Type: Bug >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > {code} > CREATE TABLE t1 (a int, b int); > EXPLAIN VECTORIZATION ONLY SELECT ROW_NUMBER() OVER(order by a) AS rn FROM t1; > {code} > {code} > PLAN VECTORIZATION: > enabled: true > enabledConditionsMet: [hive.vectorized.execution.enabled IS true] > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Vertices: > Map 1 > Execution mode: vectorized, llap > LLAP IO: all inputs > Map Vectorization: > enabled: true > enabledConditionsMet: > hive.vectorized.use.vector.serde.deserialize IS true > inputFormatFeatureSupport: [DECIMAL_64] > featureSupportInUse: [DECIMAL_64] > inputFileFormats: org.apache.hadoop.mapred.TextInputFormat > allNative: true > usesVectorUDFAdaptor: false > vectorized: true > Reducer 2 > Execution mode: llap > Reduce Vectorization: > enabled: true > enableConditionsMet: hive.vectorized.execution.reduce.enabled > IS true, hive.execution.engine tez IN [tez] IS true > notVectorizedReason: PTF operator: ROW_NUMBER not in > supported functions [avg, count, dense_rank, first_value, lag, last_value, > lead, max, min, rank, row_number, sum] > vectorized: false > Stage: Stage-0 > Fetch Operator > {code} > {code} > notVectorizedReason: PTF operator: ROW_NUMBER not in > supported functions [avg, count, dense_rank, first_value, lag, last_value, > lead, max, min, rank, row_number, sum] > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26274) No vectorization if query has upper case window function
[ https://issues.apache.org/jira/browse/HIVE-26274?focusedWorklogId=777296&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777296 ] ASF GitHub Bot logged work on HIVE-26274: - Author: ASF GitHub Bot Created on: 02/Jun/22 06:54 Start Date: 02/Jun/22 06:54 Worklog Time Spent: 10m Work Description: kasakrisz merged PR #3332: URL: https://github.com/apache/hive/pull/3332 Issue Time Tracking --- Worklog Id: (was: 777296) Time Spent: 0.5h (was: 20m) > No vectorization if query has upper case window function > > > Key: HIVE-26274 > URL: https://issues.apache.org/jira/browse/HIVE-26274 > Project: Hive > Issue Type: Bug >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > {code} > CREATE TABLE t1 (a int, b int); > EXPLAIN VECTORIZATION ONLY SELECT ROW_NUMBER() OVER(order by a) AS rn FROM t1; > {code} > {code} > PLAN VECTORIZATION: > enabled: true > enabledConditionsMet: [hive.vectorized.execution.enabled IS true] > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Vertices: > Map 1 > Execution mode: vectorized, llap > LLAP IO: all inputs > Map Vectorization: > enabled: true > enabledConditionsMet: > hive.vectorized.use.vector.serde.deserialize IS true > inputFormatFeatureSupport: [DECIMAL_64] > featureSupportInUse: [DECIMAL_64] > inputFileFormats: org.apache.hadoop.mapred.TextInputFormat > allNative: true > usesVectorUDFAdaptor: false > vectorized: true > Reducer 2 > Execution mode: llap > Reduce Vectorization: > enabled: true > enableConditionsMet: hive.vectorized.execution.reduce.enabled > IS true, hive.execution.engine tez IN [tez] IS true > notVectorizedReason: PTF operator: ROW_NUMBER not in > supported functions [avg, count, dense_rank, first_value, lag, last_value, > lead, max, min, rank, row_number, sum] > vectorized: false > Stage: Stage-0 > Fetch Operator > {code} > {code} > notVectorizedReason: PTF operator: ROW_NUMBER not in > supported functions [avg, count, dense_rank, first_value, lag, last_value, > lead, max, min, rank, row_number, sum] > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26274) No vectorization if query has upper case window function
[ https://issues.apache.org/jira/browse/HIVE-26274?focusedWorklogId=777293&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777293 ] ASF GitHub Bot logged work on HIVE-26274: - Author: ASF GitHub Bot Created on: 02/Jun/22 06:46 Start Date: 02/Jun/22 06:46 Worklog Time Spent: 10m Work Description: abstractdog commented on PR #3332: URL: https://github.com/apache/hive/pull/3332#issuecomment-1144501074 LGTM, thanks for the patch @kasakrisz Issue Time Tracking --- Worklog Id: (was: 777293) Time Spent: 20m (was: 10m) > No vectorization if query has upper case window function > > > Key: HIVE-26274 > URL: https://issues.apache.org/jira/browse/HIVE-26274 > Project: Hive > Issue Type: Bug >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > {code} > CREATE TABLE t1 (a int, b int); > EXPLAIN VECTORIZATION ONLY SELECT ROW_NUMBER() OVER(order by a) AS rn FROM t1; > {code} > {code} > PLAN VECTORIZATION: > enabled: true > enabledConditionsMet: [hive.vectorized.execution.enabled IS true] > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Vertices: > Map 1 > Execution mode: vectorized, llap > LLAP IO: all inputs > Map Vectorization: > enabled: true > enabledConditionsMet: > hive.vectorized.use.vector.serde.deserialize IS true > inputFormatFeatureSupport: [DECIMAL_64] > featureSupportInUse: [DECIMAL_64] > inputFileFormats: org.apache.hadoop.mapred.TextInputFormat > allNative: true > usesVectorUDFAdaptor: false > vectorized: true > Reducer 2 > Execution mode: llap > Reduce Vectorization: > enabled: true > enableConditionsMet: hive.vectorized.execution.reduce.enabled > IS true, hive.execution.engine tez IN [tez] IS true > notVectorizedReason: PTF operator: ROW_NUMBER not in > supported functions [avg, count, dense_rank, first_value, lag, last_value, > lead, max, min, rank, row_number, sum] > vectorized: false > Stage: Stage-0 > Fetch Operator > {code} > {code} > notVectorizedReason: PTF operator: ROW_NUMBER not in > supported functions [avg, count, dense_rank, first_value, lag, last_value, > lead, max, min, rank, row_number, sum] > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26285) Overwrite database metadata on original source in optimised failover.
[ https://issues.apache.org/jira/browse/HIVE-26285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haymant Mangla updated HIVE-26285: -- Parent: HIVE-25699 Issue Type: Sub-task (was: Bug) > Overwrite database metadata on original source in optimised failover. > - > > Key: HIVE-26285 > URL: https://issues.apache.org/jira/browse/HIVE-26285 > Project: Hive > Issue Type: Sub-task >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (HIVE-26285) Overwrite database metadata on original source in optimised failover.
[ https://issues.apache.org/jira/browse/HIVE-26285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haymant Mangla reassigned HIVE-26285: - > Overwrite database metadata on original source in optimised failover. > - > > Key: HIVE-26285 > URL: https://issues.apache.org/jira/browse/HIVE-26285 > Project: Hive > Issue Type: Bug >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-21160) Rewrite Update statement as Multi-insert and do Update split early
[ https://issues.apache.org/jira/browse/HIVE-21160?focusedWorklogId=777258&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777258 ] ASF GitHub Bot logged work on HIVE-21160: - Author: ASF GitHub Bot Created on: 02/Jun/22 02:42 Start Date: 02/Jun/22 02:42 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #2855: URL: https://github.com/apache/hive/pull/2855#discussion_r877908690 ## common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: ## @@ -3392,11 +3392,19 @@ public static enum ConfVars { MERGE_CARDINALITY_VIOLATION_CHECK("hive.merge.cardinality.check", true, "Set to true to ensure that each SQL Merge statement ensures that for each row in the target\n" + "table there is at most 1 matching row in the source table per SQL Specification."), +SPLIT_UPDATE("hive.split.update", true, Review Comment: Updating larger datasets the split update can perform better and this is also a precondition to enable updating partition and bucketing keys. Issue Time Tracking --- Worklog Id: (was: 777258) Time Spent: 3h 20m (was: 3h 10m) > Rewrite Update statement as Multi-insert and do Update split early > -- > > Key: HIVE-21160 > URL: https://issues.apache.org/jira/browse/HIVE-21160 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-21160) Rewrite Update statement as Multi-insert and do Update split early
[ https://issues.apache.org/jira/browse/HIVE-21160?focusedWorklogId=777254&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777254 ] ASF GitHub Bot logged work on HIVE-21160: - Author: ASF GitHub Bot Created on: 02/Jun/22 02:32 Start Date: 02/Jun/22 02:32 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #2855: URL: https://github.com/apache/hive/pull/2855#discussion_r887446340 ## ql/src/java/org/apache/hadoop/hive/ql/parse/MergeSemanticAnalyzer.java: ## @@ -470,8 +405,7 @@ private String handleUpdate(ASTNode whenMatchedUpdateClause, StringBuilder rewri rewrittenQueryStr.append(" AND NOT(").append(deleteExtraPredicate).append(")"); } if(!splitUpdateEarly) { - rewrittenQueryStr.append("\n SORT BY "); - rewrittenQueryStr.append(targetName).append(".ROW__ID "); + appendSortBy(rewrittenQueryStr, Collections.singletonList(targetName + ".ROW__ID ")); } rewrittenQueryStr.append("\n"); Review Comment: Added logging rewritten AST in RewriteSA before passing it to super.analyze. Issue Time Tracking --- Worklog Id: (was: 777254) Time Spent: 3h 10m (was: 3h) > Rewrite Update statement as Multi-insert and do Update split early > -- > > Key: HIVE-21160 > URL: https://issues.apache.org/jira/browse/HIVE-21160 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-21304) Make bucketing version usage more robust
[ https://issues.apache.org/jira/browse/HIVE-21304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545226#comment-17545226 ] katty he commented on HIVE-21304: - this patch cannot be used on hive 3.1.2 version? when i use this patch on hive 3.1.2, the test testMergeOnTezEdges cannot be passed, is there some other patch i should pick? > Make bucketing version usage more robust > > > Key: HIVE-21304 > URL: https://issues.apache.org/jira/browse/HIVE-21304 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Zoltan Haindrich >Priority: Major > Fix For: 4.0.0, 4.0.0-alpha-1 > > Attachments: HIVE-21304.01.patch, HIVE-21304.02.patch, > HIVE-21304.03.patch, HIVE-21304.04.patch, HIVE-21304.05.patch, > HIVE-21304.06.patch, HIVE-21304.07.patch, HIVE-21304.08.patch, > HIVE-21304.09.patch, HIVE-21304.10.patch, HIVE-21304.11.patch, > HIVE-21304.12.patch, HIVE-21304.13.patch, HIVE-21304.14.patch, > HIVE-21304.15.patch, HIVE-21304.16.patch, HIVE-21304.17.patch, > HIVE-21304.18.patch, HIVE-21304.19.patch, HIVE-21304.20.patch, > HIVE-21304.21.patch, HIVE-21304.22.patch, HIVE-21304.23.patch, > HIVE-21304.24.patch, HIVE-21304.25.patch, HIVE-21304.26.patch, > HIVE-21304.27.patch, HIVE-21304.28.patch, HIVE-21304.29.patch, > HIVE-21304.30.patch, HIVE-21304.31.patch, HIVE-21304.32.patch, > HIVE-21304.33.patch, HIVE-21304.33.patch, HIVE-21304.33.patch, > HIVE-21304.34.patch, HIVE-21304.34.patch, HIVE-21304.35.patch, > HIVE-21304.35.patch, HIVE-21304.36.patch, HIVE-21304.37.patch, > HIVE-21304.38.patch, HIVE-21304.38.patch, HIVE-21304.38.patch, > HIVE-21304.39.patch, HIVE-21304.40.patch > > Time Spent: 50m > Remaining Estimate: 0h > > * Show Bucketing version for ReduceSinkOp in explain extended plan - this > helps identify what hashing algorithm is being used by by ReduceSinkOp. > * move the actually selected version to the "conf" so that it doesn't get lost > * replace trait related logic with a separate optimizer rule > * do version selection based on a group of operator - this is more reliable > * skip bucketingversion selection for tables with 1 buckets > * prefer to use version 2 if possible > * fix operator creations which didn't set a new conf -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HIVE-26230) Option to URL encode special chars in hbase.column.mapping that are valid HBase column family chars
[ https://issues.apache.org/jira/browse/HIVE-26230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ádám Szita resolved HIVE-26230. --- Fix Version/s: 4.0.0 Resolution: Fixed Committed to master. Thanks for reviewing [~pvary] > Option to URL encode special chars in hbase.column.mapping that are valid > HBase column family chars > --- > > Key: HIVE-26230 > URL: https://issues.apache.org/jira/browse/HIVE-26230 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > HIVE-26015 and HIVE-26139 aimed to fix a missing special character handling > of values provided for hbase.column.mapping. Values here are used as an URL > for Ranger based authentication and special characters need to be URL encoded > for this feature. > This is currently done only for # char. We should handle all special > characters that are valid HBase column family characters but count as special > characters for URLs. > The URL encoding of HIVE-26015 should come back, as in HBase we can have > almost any characters for column family (excluding : / ). To make this a > backward-compatible change, the URL encoding will essentially be optional, so > users won't have to make changes to their working environment. Should they > encounter a special character in their HBase table definition though, they > can turn this URL encoding feature on, which in turn comes with the > requirement from their end to update their Ranger policies so they are in URL > encoded format for these tables. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26230) Option to URL encode special chars in hbase.column.mapping that are valid HBase column family chars
[ https://issues.apache.org/jira/browse/HIVE-26230?focusedWorklogId=777149&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777149 ] ASF GitHub Bot logged work on HIVE-26230: - Author: ASF GitHub Bot Created on: 01/Jun/22 20:41 Start Date: 01/Jun/22 20:41 Worklog Time Spent: 10m Work Description: szlta merged PR #3314: URL: https://github.com/apache/hive/pull/3314 Issue Time Tracking --- Worklog Id: (was: 777149) Time Spent: 20m (was: 10m) > Option to URL encode special chars in hbase.column.mapping that are valid > HBase column family chars > --- > > Key: HIVE-26230 > URL: https://issues.apache.org/jira/browse/HIVE-26230 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > HIVE-26015 and HIVE-26139 aimed to fix a missing special character handling > of values provided for hbase.column.mapping. Values here are used as an URL > for Ranger based authentication and special characters need to be URL encoded > for this feature. > This is currently done only for # char. We should handle all special > characters that are valid HBase column family characters but count as special > characters for URLs. > The URL encoding of HIVE-26015 should come back, as in HBase we can have > almost any characters for column family (excluding : / ). To make this a > backward-compatible change, the URL encoding will essentially be optional, so > users won't have to make changes to their working environment. Should they > encounter a special character in their HBase table definition though, they > can turn this URL encoding feature on, which in turn comes with the > requirement from their end to update their Ranger policies so they are in URL > encoded format for these tables. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=777136&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777136 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 01/Jun/22 19:59 Start Date: 01/Jun/22 19:59 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r887249049 ## ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableOperation.java: ## @@ -99,7 +99,8 @@ public int execute() throws HiveException { createTableNonReplaceMode(tbl); } -DDLUtils.addIfAbsentByName(new WriteEntity(tbl, WriteEntity.WriteType.DDL_NO_LOCK), context); + DDLUtils.addIfAbsentByName(new WriteEntity(tbl, WriteEntity.WriteType.DDL_NO_LOCK), context); Review Comment: Removed populating outputs in CreateTableOperation from the previous commit and retained it only in SemanticAnalyze. So, when removing the previous code, extra space has crept in. Will remove the extra line and space. Issue Time Tracking --- Worklog Id: (was: 777136) Time Spent: 2h (was: 1h 50m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HIVE-26272) Inline util code that is used from log4j jar
[ https://issues.apache.org/jira/browse/HIVE-26272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena resolved HIVE-26272. - Fix Version/s: 4.0.0-alpha-2 Hadoop Flags: Reviewed Resolution: Fixed > Inline util code that is used from log4j jar > > > Key: HIVE-26272 > URL: https://issues.apache.org/jira/browse/HIVE-26272 > Project: Hive > Issue Type: Improvement > Components: Server Infrastructure >Affects Versions: 3.1.3 >Reporter: PJ Fanning >Assignee: PJ Fanning >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 40m > Remaining Estimate: 0h > > See https://issues.apache.org/jira/browse/DRILL-8240 and related issues for > background. > HiveServer2 uses log4j Strings class for a isBlank method. > I can add a PR to inline this code. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-26272) Inline util code that is used from log4j jar
[ https://issues.apache.org/jira/browse/HIVE-26272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545063#comment-17545063 ] Ayush Saxena commented on HIVE-26272: - Merged PR to master. Thanx [~pj.fanning] for the contribution!!! > Inline util code that is used from log4j jar > > > Key: HIVE-26272 > URL: https://issues.apache.org/jira/browse/HIVE-26272 > Project: Hive > Issue Type: Improvement > Components: Server Infrastructure >Affects Versions: 3.1.3 >Reporter: PJ Fanning >Assignee: PJ Fanning >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > See https://issues.apache.org/jira/browse/DRILL-8240 and related issues for > background. > HiveServer2 uses log4j Strings class for a isBlank method. > I can add a PR to inline this code. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26272) Inline util code that is used from log4j jar
[ https://issues.apache.org/jira/browse/HIVE-26272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HIVE-26272: Summary: Inline util code that is used from log4j jar (was: inline log4j util code used in HiveServer2.java) > Inline util code that is used from log4j jar > > > Key: HIVE-26272 > URL: https://issues.apache.org/jira/browse/HIVE-26272 > Project: Hive > Issue Type: Improvement > Components: Server Infrastructure >Affects Versions: 3.1.3 >Reporter: PJ Fanning >Assignee: PJ Fanning >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > See https://issues.apache.org/jira/browse/DRILL-8240 and related issues for > background. > HiveServer2 uses log4j Strings class for a isBlank method. > I can add a PR to inline this code. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26272) inline log4j util code used in HiveServer2.java
[ https://issues.apache.org/jira/browse/HIVE-26272?focusedWorklogId=777054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777054 ] ASF GitHub Bot logged work on HIVE-26272: - Author: ASF GitHub Bot Created on: 01/Jun/22 17:48 Start Date: 01/Jun/22 17:48 Worklog Time Spent: 10m Work Description: ayushtkn merged PR #3330: URL: https://github.com/apache/hive/pull/3330 Issue Time Tracking --- Worklog Id: (was: 777054) Time Spent: 40m (was: 0.5h) > inline log4j util code used in HiveServer2.java > --- > > Key: HIVE-26272 > URL: https://issues.apache.org/jira/browse/HIVE-26272 > Project: Hive > Issue Type: Improvement > Components: Server Infrastructure >Affects Versions: 3.1.3 >Reporter: PJ Fanning >Assignee: PJ Fanning >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > See https://issues.apache.org/jira/browse/DRILL-8240 and related issues for > background. > HiveServer2 uses log4j Strings class for a isBlank method. > I can add a PR to inline this code. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26196) Integrate Sonar analysis for the master branch and PRs
[ https://issues.apache.org/jira/browse/HIVE-26196?focusedWorklogId=777009&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777009 ] ASF GitHub Bot logged work on HIVE-26196: - Author: ASF GitHub Bot Created on: 01/Jun/22 16:49 Start Date: 01/Jun/22 16:49 Worklog Time Spent: 10m Work Description: asolimando commented on PR #3254: URL: https://github.com/apache/hive/pull/3254#issuecomment-1143867392 @kgyrtkirk can you have another look? At the moment we are analyzing PRs and the master branch, I don't think we need to support labels or anything, because whatever change we want to make has to pass through a PR, we will spot eventual issues before merging, so I think we can stick with just analyzing the master branch. WDYT? PS: I have updated the title of the Jira ticket, the message of the first commit is outdated, I am not amending it because otherwise it will trigger CI again for nothing. Issue Time Tracking --- Worklog Id: (was: 777009) Time Spent: 1h 10m (was: 1h) > Integrate Sonar analysis for the master branch and PRs > -- > > Key: HIVE-26196 > URL: https://issues.apache.org/jira/browse/HIVE-26196 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > The aim of the ticket is to integrate SonarCloud analysis for the master > branch and PRs. > The ticket does not cover test coverage at the moment (it can be added in > follow-up tickets, if there is enough interest). > From preliminary tests, the analysis step requires 30 additional minutes for > the pipeline, but this step is run in parallel with the test run, so the > total end-to-end run-time is not affected. > The idea for this first integration is to track code quality metrics over new > commits in the master branch and for PRs, without any quality gate rules > (i.e., the analysis will never fail, independently of the values of the > quality metrics). > An example of analysis is available in the ASF Sonar account for Hive: [PR > analysis|https://sonarcloud.io/summary/new_code?id=apache_hive&pullRequest=3254] > After integrating the changes, PRs will also be decorated with a link to the > analysis to be able to better evaluate any pain points of the contribution at > an earlier stage, making the life of the reviewers a bit easier. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26270) Wrong timestamps when reading Hive 3.1.x Parquet files with vectorized reader
[ https://issues.apache.org/jira/browse/HIVE-26270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26270: -- Labels: compatibility pull-request-available timestamp (was: compatibility timestamp) > Wrong timestamps when reading Hive 3.1.x Parquet files with vectorized reader > - > > Key: HIVE-26270 > URL: https://issues.apache.org/jira/browse/HIVE-26270 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Parquet >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: compatibility, pull-request-available, timestamp > Time Spent: 10m > Remaining Estimate: 0h > > Parquet files written in Hive 3.1.x onwards with timezone set to US/Pacific. > {code:sql} > CREATE TABLE employee (eid INT, birth timestamp) STORED AS PARQUET; > INSERT INTO employee VALUES > (1, '1880-01-01 00:00:00'), > (2, '1884-01-01 00:00:00'), > (3, '1990-01-01 00:00:00'); > {code} > Parquet files read with Hive 4.0.0-apha-1 onwards. > +Without vectorization+ results are correct. > {code:sql} > SELECT * FROM employee; > {code} > {noformat} > 1 1880-01-01 00:00:00 > 2 1884-01-01 00:00:00 > 3 1990-01-01 00:00:00 > {noformat} > +With vectorization+ some timestamps are shifted. > {code:sql} > -- Disable fetch task conversion to force vectorization kick in > set hive.fetch.task.conversion=none; > SELECT * FROM employee; > {code} > {noformat} > 1 1879-12-31 23:52:58 > 2 1884-01-01 00:00:00 > 3 1990-01-01 00:00:00 > {noformat} > The problem is the same reported under HIVE-24074. The data were written > using the new Date/Time APIs (java.time) in version Hive 3.1.3 and here they > were read using the old APIs (java.sql). > The difference with HIVE-24074 is that here the problem appears only for > vectorized execution while the non-vectorized reader is working fine so there > is some *inconsistency in the behavior* of vectorized and non vectorized > readers. > Non-vectorized reader works fine cause it derives automatically that it > should use the new JDK APIs to read back the timestamp value. This is > possible in this case cause there are metadata information in the file (i.e., > the presence of {{{}writer.time.zone{}}}) from where it can infer that the > timestamps were written using the new Date/Time APIs. > The inconsistent behavior between vectorized and non-vectorized reader is a > regression caused by HIVE-25104. This JIRA is an attempt to re-align the > behavior between vectorized and non-vectorized readers. > Note that if the file metadata are empty both vectorized and non-vectorized > reader cannot determine which APIs to use for the conversion and in this case > it is necessary the user to set the > {{hive.parquet.timestamp.legacy.conversion.enabled}} explicitly to get back > the correct results. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26270) Wrong timestamps when reading Hive 3.1.x Parquet files with vectorized reader
[ https://issues.apache.org/jira/browse/HIVE-26270?focusedWorklogId=776969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776969 ] ASF GitHub Bot logged work on HIVE-26270: - Author: ASF GitHub Bot Created on: 01/Jun/22 15:49 Start Date: 01/Jun/22 15:49 Worklog Time Spent: 10m Work Description: zabetak opened a new pull request, #3338: URL: https://github.com/apache/hive/pull/3338 ### What changes were proposed in this pull request? 1. Extract legacy conversion derivation logic based on file metadata and configuration into separate method. 2. Use the same logic for determining the conversion in both vectorized and non-vectorized Parquet readers by exploiting the new method. ### Why are the changes needed? 1. Remedy "wrong" results when using the vectorized reader 2. Align behavior between vectorized/non-vectorized code ### Does this PR introduce _any_ user-facing change? Yes, result of the queries may be affected. ### How was this patch tested? `mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=parquet_timestamp_int96_compatibility_hive3_1_3.q` Compare wrong results in https://github.com/apache/hive/commit/5a1512ccf1619d744e65aa1a882326cb9df60dd8 with correct results https://github.com/apache/hive/commit/e38b4ec868043e897ca2cc9da8b40a4742cb4757 Issue Time Tracking --- Worklog Id: (was: 776969) Remaining Estimate: 0h Time Spent: 10m > Wrong timestamps when reading Hive 3.1.x Parquet files with vectorized reader > - > > Key: HIVE-26270 > URL: https://issues.apache.org/jira/browse/HIVE-26270 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Parquet >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: compatibility, timestamp > Time Spent: 10m > Remaining Estimate: 0h > > Parquet files written in Hive 3.1.x onwards with timezone set to US/Pacific. > {code:sql} > CREATE TABLE employee (eid INT, birth timestamp) STORED AS PARQUET; > INSERT INTO employee VALUES > (1, '1880-01-01 00:00:00'), > (2, '1884-01-01 00:00:00'), > (3, '1990-01-01 00:00:00'); > {code} > Parquet files read with Hive 4.0.0-apha-1 onwards. > +Without vectorization+ results are correct. > {code:sql} > SELECT * FROM employee; > {code} > {noformat} > 1 1880-01-01 00:00:00 > 2 1884-01-01 00:00:00 > 3 1990-01-01 00:00:00 > {noformat} > +With vectorization+ some timestamps are shifted. > {code:sql} > -- Disable fetch task conversion to force vectorization kick in > set hive.fetch.task.conversion=none; > SELECT * FROM employee; > {code} > {noformat} > 1 1879-12-31 23:52:58 > 2 1884-01-01 00:00:00 > 3 1990-01-01 00:00:00 > {noformat} > The problem is the same reported under HIVE-24074. The data were written > using the new Date/Time APIs (java.time) in version Hive 3.1.3 and here they > were read using the old APIs (java.sql). > The difference with HIVE-24074 is that here the problem appears only for > vectorized execution while the non-vectorized reader is working fine so there > is some *inconsistency in the behavior* of vectorized and non vectorized > readers. > Non-vectorized reader works fine cause it derives automatically that it > should use the new JDK APIs to read back the timestamp value. This is > possible in this case cause there are metadata information in the file (i.e., > the presence of {{{}writer.time.zone{}}}) from where it can infer that the > timestamps were written using the new Date/Time APIs. > The inconsistent behavior between vectorized and non-vectorized reader is a > regression caused by HIVE-25104. This JIRA is an attempt to re-align the > behavior between vectorized and non-vectorized readers. > Note that if the file metadata are empty both vectorized and non-vectorized > reader cannot determine which APIs to use for the conversion and in this case > it is necessary the user to set the > {{hive.parquet.timestamp.legacy.conversion.enabled}} explicitly to get back > the correct results. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26196) Integrate Sonar analysis for the master branch and PRs
[ https://issues.apache.org/jira/browse/HIVE-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26196: Description: The aim of the ticket is to integrate SonarCloud analysis for the master branch and PRs. The ticket does not cover test coverage at the moment (it can be added in follow-up tickets, if there is enough interest). >From preliminary tests, the analysis step requires 30 additional minutes for >the pipeline, but this step is run in parallel with the test run, so the total >end-to-end run-time is not affected. The idea for this first integration is to track code quality metrics over new commits in the master branch and for PRs, without any quality gate rules (i.e., the analysis will never fail, independently of the values of the quality metrics). An example of analysis is available in the ASF Sonar account for Hive: [PR analysis|https://sonarcloud.io/summary/new_code?id=apache_hive&pullRequest=3254] After integrating the changes, PRs will also be decorated with a link to the analysis to be able to better evaluate any pain points of the contribution at an earlier stage, making the life of the reviewers a bit easier. was: The aim of the ticket is to integrate SonarCloud analysis for the master branch. The ticket does not cover: * test coverage * analysis on PRs and other branches Those aspects can be added in follow-up tickets, if there is enough interest. >From preliminary tests, the analysis step requires 30 additional minutes for >the pipeline. The idea for this first integration is to track code quality metrics over new commits in the master branch, without any quality gate rules (i.e., the analysis will never fail, independently of the values of the quality metrics). An example of analysis is available in my personal Sonar account: [https://sonarcloud.io/summary/new_code?id=asolimando_hive] ASF offers SonarCloud accounts for Apache projects, and Hive already has one (https://sonarcloud.io/project/configuration?id=apache_hive, created via INFRA-22542), for completing the present ticket, somebody having admin permissions in that repo should generated an authentication token, which should replace the _SONAR_TOKEN_ secret in Jenkins. > Integrate Sonar analysis for the master branch and PRs > -- > > Key: HIVE-26196 > URL: https://issues.apache.org/jira/browse/HIVE-26196 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > The aim of the ticket is to integrate SonarCloud analysis for the master > branch and PRs. > The ticket does not cover test coverage at the moment (it can be added in > follow-up tickets, if there is enough interest). > From preliminary tests, the analysis step requires 30 additional minutes for > the pipeline, but this step is run in parallel with the test run, so the > total end-to-end run-time is not affected. > The idea for this first integration is to track code quality metrics over new > commits in the master branch and for PRs, without any quality gate rules > (i.e., the analysis will never fail, independently of the values of the > quality metrics). > An example of analysis is available in the ASF Sonar account for Hive: [PR > analysis|https://sonarcloud.io/summary/new_code?id=apache_hive&pullRequest=3254] > After integrating the changes, PRs will also be decorated with a link to the > analysis to be able to better evaluate any pain points of the contribution at > an earlier stage, making the life of the reviewers a bit easier. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26196) Integrate Sonar analysis for the master branch and PRs
[ https://issues.apache.org/jira/browse/HIVE-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26196: Summary: Integrate Sonar analysis for the master branch and PRs (was: Integrate Sonar analysis for the master branch) > Integrate Sonar analysis for the master branch and PRs > -- > > Key: HIVE-26196 > URL: https://issues.apache.org/jira/browse/HIVE-26196 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > The aim of the ticket is to integrate SonarCloud analysis for the master > branch. > The ticket does not cover: > * test coverage > * analysis on PRs and other branches > Those aspects can be added in follow-up tickets, if there is enough interest. > From preliminary tests, the analysis step requires 30 additional minutes for > the pipeline. > The idea for this first integration is to track code quality metrics over new > commits in the master branch, without any quality gate rules (i.e., the > analysis will never fail, independently of the values of the quality metrics). > An example of analysis is available in my personal Sonar account: > [https://sonarcloud.io/summary/new_code?id=asolimando_hive] > ASF offers SonarCloud accounts for Apache projects, and Hive already has one > (https://sonarcloud.io/project/configuration?id=apache_hive, created via > INFRA-22542), for completing the present ticket, somebody having admin > permissions in that repo should generated an authentication token, which > should replace the _SONAR_TOKEN_ secret in Jenkins. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26196) Integrate Sonar analysis for the master branch
[ https://issues.apache.org/jira/browse/HIVE-26196?focusedWorklogId=776890&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776890 ] ASF GitHub Bot logged work on HIVE-26196: - Author: ASF GitHub Bot Created on: 01/Jun/22 14:19 Start Date: 01/Jun/22 14:19 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3254: URL: https://github.com/apache/hive/pull/3254#issuecomment-1143673904 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=3254) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3254&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3254&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3254&resolved=false&types=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=CODE_SMELL) [0 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3254&metric=coverage&view=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3254&metric=duplicated_lines_density&view=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 776890) Time Spent: 1h (was: 50m) > Integrate Sonar analysis for the master branch > -- > > Key: HIVE-26196 > URL: https://issues.apache.org/jira/browse/HIVE-26196 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > The aim of the ticket is to integrate SonarCloud analysis for the master > branch. > The ticket does not cover: > * test coverage > * analysis on PRs and other branches > Those aspects can be added in follow-up tickets, if there is enough interest. > From preliminary tests, the analysis step requires 30 additional minutes for > the pipeline. > The idea for this first integration is to track code quality metrics over new > commits in the master branch, without any quality gate rules (i.e., the > analysis will never fail, i
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776879&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776879 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 13:59 Start Date: 01/Jun/22 13:59 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886842968 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java: ## @@ -549,4 +534,43 @@ private static Schema schemaWithoutConstantsAndMeta(Schema readSchema, Map implements CloseableIterator { Review Comment: Moved and replaced `4` to `FILE_READ_META_COLS.size()` Issue Time Tracking --- Worklog Id: (was: 776879) Time Spent: 5h 20m (was: 5h 10m) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) > ... 15 more > Caused by: org.apache.hadoop.h
[jira] [Work logged] (HIVE-26282) Improve iceberg CTAS error message for unsupported types
[ https://issues.apache.org/jira/browse/HIVE-26282?focusedWorklogId=776871&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776871 ] ASF GitHub Bot logged work on HIVE-26282: - Author: ASF GitHub Bot Created on: 01/Jun/22 13:43 Start Date: 01/Jun/22 13:43 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3337: URL: https://github.com/apache/hive/pull/3337#discussion_r886824007 ## iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaConverter.java: ## @@ -83,8 +83,9 @@ Type convertType(TypeInfo typeInfo) { return Types.BooleanType.get(); case BYTE: case SHORT: -Preconditions.checkArgument(autoConvert, "Unsupported Hive type: %s, use integer instead", -((PrimitiveTypeInfo) typeInfo).getPrimitiveCategory()); +Preconditions.checkArgument(autoConvert, "Unsupported Hive type: %s, use integer " + Review Comment: `Unsupported Hive type: %s, use integer instead or enable automatic type conversion, set 'iceberg.mr.schema.auto.conversion' to true`? Issue Time Tracking --- Worklog Id: (was: 776871) Time Spent: 0.5h (was: 20m) > Improve iceberg CTAS error message for unsupported types > > > Key: HIVE-26282 > URL: https://issues.apache.org/jira/browse/HIVE-26282 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When running a CTAS query using a hive table that has a tinyint, smallint, > varchar or char column it fails with an "Unsupported Hive type" error > message. This can be worked around if the > 'iceberg.mr.schema.auto.conversion' property is set to true on session level. > We should communicate this possibility when raising the exception. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26282) Improve iceberg CTAS error message for unsupported types
[ https://issues.apache.org/jira/browse/HIVE-26282?focusedWorklogId=776870&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776870 ] ASF GitHub Bot logged work on HIVE-26282: - Author: ASF GitHub Bot Created on: 01/Jun/22 13:42 Start Date: 01/Jun/22 13:42 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3337: URL: https://github.com/apache/hive/pull/3337#discussion_r886823347 ## iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaConverter.java: ## @@ -83,8 +83,9 @@ Type convertType(TypeInfo typeInfo) { return Types.BooleanType.get(); case BYTE: case SHORT: -Preconditions.checkArgument(autoConvert, "Unsupported Hive type: %s, use integer instead", -((PrimitiveTypeInfo) typeInfo).getPrimitiveCategory()); +Preconditions.checkArgument(autoConvert, "Unsupported Hive type: %s, use integer " + +"instead. To enable automatic type conversion, set 'iceberg.mr.schema.auto.conversion' to true " + +"on session level.", ((PrimitiveTypeInfo) typeInfo).getPrimitiveCategory()); Review Comment: Why `on session level`? Issue Time Tracking --- Worklog Id: (was: 776870) Time Spent: 20m (was: 10m) > Improve iceberg CTAS error message for unsupported types > > > Key: HIVE-26282 > URL: https://issues.apache.org/jira/browse/HIVE-26282 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When running a CTAS query using a hive table that has a tinyint, smallint, > varchar or char column it fails with an "Unsupported Hive type" error > message. This can be worked around if the > 'iceberg.mr.schema.auto.conversion' property is set to true on session level. > We should communicate this possibility when raising the exception. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776865&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776865 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 13:30 Start Date: 01/Jun/22 13:30 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886809647 ## ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java: ## @@ -78,8 +78,8 @@ public void initialize(QueryState queryState, QueryPlan queryPlan, TaskQueue tas if (source instanceof TableScanOperator) { TableScanOperator ts = (TableScanOperator) source; // push down projections -ColumnProjectionUtils.appendReadColumns( -job, ts.getNeededColumnIDs(), ts.getNeededColumns(), ts.getNeededNestedColumnPaths()); +ColumnProjectionUtils.appendReadColumns(job, ts.getNeededColumnIDs(), ts.getNeededColumns(), +ts.getNeededNestedColumnPaths(), ts.conf.hasVirtualCols()); Review Comment: Unfortunately it is not consistent when we are expose or not. Example.: https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java#L86 Issue Time Tracking --- Worklog Id: (was: 776865) Time Spent: 5h 10m (was: 5h) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 5h 10m > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.Runtime
[jira] [Work logged] (HIVE-26282) Improve iceberg CTAS error message for unsupported types
[ https://issues.apache.org/jira/browse/HIVE-26282?focusedWorklogId=776863&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776863 ] ASF GitHub Bot logged work on HIVE-26282: - Author: ASF GitHub Bot Created on: 01/Jun/22 13:27 Start Date: 01/Jun/22 13:27 Worklog Time Spent: 10m Work Description: lcspinter opened a new pull request, #3337: URL: https://github.com/apache/hive/pull/3337 ### What changes were proposed in this pull request? Improve error message and add some unit tests ### Why are the changes needed? When running a CTAS query using a hive table that has a tinyint, smallint, varchar or char column it fails with an "Unsupported Hive type" error message. This can be worked around if the 'iceberg.mr.schema.auto.conversion' property is set to true on session level. We should communicate this possibility when raising the exception. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Unit test Issue Time Tracking --- Worklog Id: (was: 776863) Remaining Estimate: 0h Time Spent: 10m > Improve iceberg CTAS error message for unsupported types > > > Key: HIVE-26282 > URL: https://issues.apache.org/jira/browse/HIVE-26282 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When running a CTAS query using a hive table that has a tinyint, smallint, > varchar or char column it fails with an "Unsupported Hive type" error > message. This can be worked around if the > 'iceberg.mr.schema.auto.conversion' property is set to true on session level. > We should communicate this possibility when raising the exception. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.3.3
[ https://issues.apache.org/jira/browse/HIVE-24484?focusedWorklogId=776864&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776864 ] ASF GitHub Bot logged work on HIVE-24484: - Author: ASF GitHub Bot Created on: 01/Jun/22 13:27 Start Date: 01/Jun/22 13:27 Worklog Time Spent: 10m Work Description: steveloughran commented on PR #3279: URL: https://github.com/apache/hive/pull/3279#issuecomment-1143612371 > I would guess the directory listing order might have changed... shouldn't have AFAIK Issue Time Tracking --- Worklog Id: (was: 776864) Time Spent: 12.05h (was: 11h 53m) > Upgrade Hadoop to 3.3.3 > --- > > Key: HIVE-24484 > URL: https://issues.apache.org/jira/browse/HIVE-24484 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 12.05h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26282) Improve iceberg CTAS error message for unsupported types
[ https://issues.apache.org/jira/browse/HIVE-26282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26282: -- Labels: pull-request-available (was: ) > Improve iceberg CTAS error message for unsupported types > > > Key: HIVE-26282 > URL: https://issues.apache.org/jira/browse/HIVE-26282 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When running a CTAS query using a hive table that has a tinyint, smallint, > varchar or char column it fails with an "Unsupported Hive type" error > message. This can be worked around if the > 'iceberg.mr.schema.auto.conversion' property is set to true on session level. > We should communicate this possibility when raising the exception. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (HIVE-26280) Copy more data into COMPLETED_COMPACTIONS for better supportability
[ https://issues.apache.org/jira/browse/HIVE-26280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Coppage reassigned HIVE-26280: > Copy more data into COMPLETED_COMPACTIONS for better supportability > --- > > Key: HIVE-26280 > URL: https://issues.apache.org/jira/browse/HIVE-26280 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Minor > > There is some information in COMPACTION_QUEUE that doesn't get copied over to > COMPLETED_COMPACTIONS when compaction completes. It would help with > supportability if COMPLETED_COMPACTIONS (and especially the view of it in the > SYS database) also contained this information. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (HIVE-26282) Improve iceberg CTAS error message for unsupported types
[ https://issues.apache.org/jira/browse/HIVE-26282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér reassigned HIVE-26282: > Improve iceberg CTAS error message for unsupported types > > > Key: HIVE-26282 > URL: https://issues.apache.org/jira/browse/HIVE-26282 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > > When running a CTAS query using a hive table that has a tinyint, smallint, > varchar or char column it fails with an "Unsupported Hive type" error > message. This can be worked around if the > 'iceberg.mr.schema.auto.conversion' property is set to true on session level. > We should communicate this possibility when raising the exception. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (HIVE-26281) Missing statistics when requesting partition by names via HS2
[ https://issues.apache.org/jira/browse/HIVE-26281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis reassigned HIVE-26281: -- > Missing statistics when requesting partition by names via HS2 > - > > Key: HIVE-26281 > URL: https://issues.apache.org/jira/browse/HIVE-26281 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > [Hive#getPartitionsByNames|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4155] > method can be used to obtain partition objects from the metastore by > specifying their names and other options. > {code:java} > public List getPartitionsByNames(Table tbl, List > partNames, boolean getColStats){code} > However, the partition statistics are missing from the returned objects no > matter the value of the {{getColStats}} parameter. > The problem is > [here|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4174] > and was caused by HIVE-24743. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776858&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776858 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 13:12 Start Date: 01/Jun/22 13:12 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886789104 ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -15005,6 +15004,12 @@ private AcidUtils.Operation getAcidType(String destination) { AcidUtils.Operation.INSERT); } + private Context.Operation getWriteOperation(String destination) { Review Comment: Thanks for the explanation Issue Time Tracking --- Worklog Id: (was: 776858) Time Spent: 5h (was: 4h 50m) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 5h > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) > ... 15 more > Caused by: org.apache
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776857&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776857 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 13:10 Start Date: 01/Jun/22 13:10 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886787413 ## ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java: ## @@ -739,6 +742,11 @@ protected void initializeOp(Configuration hconf) throws HiveException { } } + private void setWriteOperation(Configuration conf) { Review Comment: Thx Issue Time Tracking --- Worklog Id: (was: 776857) Time Spent: 4h 50m (was: 4h 40m) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) > ... 15 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Er
[jira] [Updated] (HIVE-24484) Upgrade Hadoop to 3.3.3
[ https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HIVE-24484: Summary: Upgrade Hadoop to 3.3.3 (was: Upgrade Hadoop to 3.3.1) > Upgrade Hadoop to 3.3.3 > --- > > Key: HIVE-24484 > URL: https://issues.apache.org/jira/browse/HIVE-24484 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 11h 53m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776856&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776856 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 13:06 Start Date: 01/Jun/22 13:06 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886783428 ## ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java: ## @@ -78,8 +78,8 @@ public void initialize(QueryState queryState, QueryPlan queryPlan, TaskQueue tas if (source instanceof TableScanOperator) { TableScanOperator ts = (TableScanOperator) source; // push down projections -ColumnProjectionUtils.appendReadColumns( -job, ts.getNeededColumnIDs(), ts.getNeededColumns(), ts.getNeededNestedColumnPaths()); +ColumnProjectionUtils.appendReadColumns(job, ts.getNeededColumnIDs(), ts.getNeededColumns(), +ts.getNeededNestedColumnPaths(), ts.conf.hasVirtualCols()); Review Comment: Do we usually expose the config object of the operators for tasks? Issue Time Tracking --- Worklog Id: (was: 776856) Time Spent: 4h 40m (was: 4.5h) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.Ma
[jira] [Updated] (HIVE-26279) Drop unused requests from TestHiveMetaStoreClientApiArgumentsChecker
[ https://issues.apache.org/jira/browse/HIVE-26279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26279: -- Labels: pull-request-available (was: ) > Drop unused requests from TestHiveMetaStoreClientApiArgumentsChecker > > > Key: HIVE-26279 > URL: https://issues.apache.org/jira/browse/HIVE-26279 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Trivial > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Some tests in TestHiveMetaStoreClientApiArgumentsChecker are creating a > request but not really using them so it is basically dead code that can be > removed. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26279) Drop unused requests from TestHiveMetaStoreClientApiArgumentsChecker
[ https://issues.apache.org/jira/browse/HIVE-26279?focusedWorklogId=776851&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776851 ] ASF GitHub Bot logged work on HIVE-26279: - Author: ASF GitHub Bot Created on: 01/Jun/22 12:56 Start Date: 01/Jun/22 12:56 Worklog Time Spent: 10m Work Description: zabetak opened a new pull request, #3336: URL: https://github.com/apache/hive/pull/3336 ### What changes were proposed in this pull request? Remove useless code ### Why are the changes needed? Readability ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `mvn test -Dtest=TestHiveMetaStoreClientApiArgumentsChecker` Issue Time Tracking --- Worklog Id: (was: 776851) Remaining Estimate: 0h Time Spent: 10m > Drop unused requests from TestHiveMetaStoreClientApiArgumentsChecker > > > Key: HIVE-26279 > URL: https://issues.apache.org/jira/browse/HIVE-26279 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Trivial > Time Spent: 10m > Remaining Estimate: 0h > > Some tests in TestHiveMetaStoreClientApiArgumentsChecker are creating a > request but not really using them so it is basically dead code that can be > removed. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26278) Add unit tests for Hive#getPartitionsByNames using batching
[ https://issues.apache.org/jira/browse/HIVE-26278?focusedWorklogId=776849&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776849 ] ASF GitHub Bot logged work on HIVE-26278: - Author: ASF GitHub Bot Created on: 01/Jun/22 12:49 Start Date: 01/Jun/22 12:49 Worklog Time Spent: 10m Work Description: zabetak opened a new pull request, #3335: URL: https://github.com/apache/hive/pull/3335 ### What changes were proposed in this pull request? New tests cases for more code coverage. ### Why are the changes needed? Ensure that ValidWriteIdList is set when batching is involved in `Hive#getPartitionByNames`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `mvn test -Dtest=TestHiveMetaStoreClientApiArgumentsChecker` Issue Time Tracking --- Worklog Id: (was: 776849) Remaining Estimate: 0h Time Spent: 10m > Add unit tests for Hive#getPartitionsByNames using batching > --- > > Key: HIVE-26278 > URL: https://issues.apache.org/jira/browse/HIVE-26278 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > [Hive#getPartitionsByNames|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4155] > supports decomposing requests in batches but there are no unit tests > checking for the ValidWriteIdList when batching is used. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26278) Add unit tests for Hive#getPartitionsByNames using batching
[ https://issues.apache.org/jira/browse/HIVE-26278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26278: -- Labels: pull-request-available (was: ) > Add unit tests for Hive#getPartitionsByNames using batching > --- > > Key: HIVE-26278 > URL: https://issues.apache.org/jira/browse/HIVE-26278 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > [Hive#getPartitionsByNames|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4155] > supports decomposing requests in batches but there are no unit tests > checking for the ValidWriteIdList when batching is used. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25421) Fallback from vectorization when reading Iceberg's time columns
[ https://issues.apache.org/jira/browse/HIVE-25421?focusedWorklogId=776847&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776847 ] ASF GitHub Bot logged work on HIVE-25421: - Author: ASF GitHub Bot Created on: 01/Jun/22 12:44 Start Date: 01/Jun/22 12:44 Worklog Time Spent: 10m Work Description: szlta opened a new pull request, #3334: URL: https://github.com/apache/hive/pull/3334 As discussed in [HIVE-25420](https://issues.apache.org/jira/browse/HIVE-25420) time column is not native Hive type, reading it is more complicated, and is not supported for vectorized read when the file format is ORC. Trying this currently results in an exception, so we should make an effort to gracefully fall back to non-vectorized reads when there's such a column in the query's projection. Issue Time Tracking --- Worklog Id: (was: 776847) Remaining Estimate: 0h Time Spent: 10m > Fallback from vectorization when reading Iceberg's time columns > --- > > Key: HIVE-25421 > URL: https://issues.apache.org/jira/browse/HIVE-25421 > Project: Hive > Issue Type: Bug >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > As discussed in HIVE-25420 time column is not native Hive type, reading it is > more complicated, and is not supported for vectorized read. Trying this > currently results in an exception, so we should make an effort to > * either gracefully fall back to non-vectorized reads when there's such a > column in the query's projection > * or work around the reading issue on the execution side. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-25421) Fallback from vectorization when reading Iceberg's time columns
[ https://issues.apache.org/jira/browse/HIVE-25421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25421: -- Labels: pull-request-available (was: ) > Fallback from vectorization when reading Iceberg's time columns > --- > > Key: HIVE-25421 > URL: https://issues.apache.org/jira/browse/HIVE-25421 > Project: Hive > Issue Type: Bug >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As discussed in HIVE-25420 time column is not native Hive type, reading it is > more complicated, and is not supported for vectorized read. Trying this > currently results in an exception, so we should make an effort to > * either gracefully fall back to non-vectorized reads when there's such a > column in the query's projection > * or work around the reading issue on the execution side. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-25421) Fallback from vectorization when reading Iceberg's time columns
[ https://issues.apache.org/jira/browse/HIVE-25421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ádám Szita updated HIVE-25421: -- Summary: Fallback from vectorization when reading Iceberg's time columns (was: Add support for reading Iceberg's time columns with vectorization turned on) > Fallback from vectorization when reading Iceberg's time columns > --- > > Key: HIVE-25421 > URL: https://issues.apache.org/jira/browse/HIVE-25421 > Project: Hive > Issue Type: Bug >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > > As discussed in HIVE-25420 time column is not native Hive type, reading it is > more complicated, and is not supported for vectorized read. Trying this > currently results in an exception, so we should make an effort to > * either gracefully fall back to non-vectorized reads when there's such a > column in the query's projection > * or work around the reading issue on the execution side. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.3.1
[ https://issues.apache.org/jira/browse/HIVE-24484?focusedWorklogId=776830&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776830 ] ASF GitHub Bot logged work on HIVE-24484: - Author: ASF GitHub Bot Created on: 01/Jun/22 12:02 Start Date: 01/Jun/22 12:02 Worklog Time Spent: 10m Work Description: steveloughran commented on PR #3279: URL: https://github.com/apache/hive/pull/3279#issuecomment-1143517119 jetty upgrade came in https://issues.apache.org/jira/browse/HADOOP-17796 & https://github.com/apache/hadoop/pull/3208 some security advisories there so it is probably better to deal with the change than try and stick to the older version. sorry Issue Time Tracking --- Worklog Id: (was: 776830) Time Spent: 11h 53m (was: 11h 43m) > Upgrade Hadoop to 3.3.1 > --- > > Key: HIVE-24484 > URL: https://issues.apache.org/jira/browse/HIVE-24484 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 11h 53m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776826&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776826 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 11:57 Start Date: 01/Jun/22 11:57 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886717900 ## ql/src/java/org/apache/hadoop/hive/ql/security/authorization/HiveCustomStorageHandlerUtils.java: ## @@ -48,4 +54,13 @@ public static Map getTableProperties(Table table) { .ifPresent(tblProps::putAll); return tblProps; } + +public static Context.Operation operation(Configuration conf, String tableName) { Review Comment: Yes, this is what I did in my last commit. Only the method name is different :) Issue Time Tracking --- Worklog Id: (was: 776826) Time Spent: 4.5h (was: 4h 20m) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoo
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776824&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776824 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 11:56 Start Date: 01/Jun/22 11:56 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886717199 ## ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java: ## @@ -616,6 +617,8 @@ protected void initializeOp(Configuration hconf) throws HiveException { initializeSpecPath(); fs = specPath.getFileSystem(hconf); + hconf.set(WRITE_OPERATION_CONFIG_PREFIX + getConf().getTableInfo().getTableName(), Review Comment: Moved both get/set write operation to HiveCustomStorageHandlerUtils. Issue Time Tracking --- Worklog Id: (was: 776824) Time Spent: 4h 20m (was: 4h 10m) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776822&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776822 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 11:55 Start Date: 01/Jun/22 11:55 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886716379 ## ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java: ## @@ -78,8 +78,8 @@ public void initialize(QueryState queryState, QueryPlan queryPlan, TaskQueue tas if (source instanceof TableScanOperator) { TableScanOperator ts = (TableScanOperator) source; // push down projections -ColumnProjectionUtils.appendReadColumns( -job, ts.getNeededColumnIDs(), ts.getNeededColumns(), ts.getNeededNestedColumnPaths()); +ColumnProjectionUtils.appendReadColumns(job, ts.getNeededColumnIDs(), ts.getNeededColumns(), +ts.getNeededNestedColumnPaths(), ts.conf.hasVirtualCols()); Review Comment: I don't see the benefit of exposing this on TSOperator. Maybe the call here would be shorter. Issue Time Tracking --- Worklog Id: (was: 776822) Time Spent: 4h 10m (was: 4h) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apa
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776820&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776820 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 11:53 Start Date: 01/Jun/22 11:53 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886714583 ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -15005,6 +15004,12 @@ private AcidUtils.Operation getAcidType(String destination) { AcidUtils.Operation.INSERT); } + private Context.Operation getWriteOperation(String destination) { Review Comment: No, these `destinations` are coming from the QueryParserInfo objects getQB().getParseInfo().getClauseNames().iterator().next(); and set in the UpdateDeleteSA like ``` rewrittenCtx.setOperation(Context.Operation.DELETE); rewrittenCtx.addDestNamePrefix(1, Context.DestClausePrefix.DELETE); ``` Issue Time Tracking --- Worklog Id: (was: 776820) Time Spent: 4h (was: 3h 50m) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 4h > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordS
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776818&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776818 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 11:49 Start Date: 01/Jun/22 11:49 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886711886 ## ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java: ## @@ -739,6 +742,11 @@ protected void initializeOp(Configuration hconf) throws HiveException { } } + private void setWriteOperation(Configuration conf) { Review Comment: Moved both get/set to `HiveCustomStorageHandlerUtils`. Issue Time Tracking --- Worklog Id: (was: 776818) Time Spent: 3h 50m (was: 3h 40m) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) > ... 15 more > Caused by: org.apache.ha
[jira] [Work started] (HIVE-26279) Drop unused requests from TestHiveMetaStoreClientApiArgumentsChecker
[ https://issues.apache.org/jira/browse/HIVE-26279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26279 started by Stamatis Zampetakis. -- > Drop unused requests from TestHiveMetaStoreClientApiArgumentsChecker > > > Key: HIVE-26279 > URL: https://issues.apache.org/jira/browse/HIVE-26279 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Trivial > > Some tests in TestHiveMetaStoreClientApiArgumentsChecker are creating a > request but not really using them so it is basically dead code that can be > removed. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (HIVE-26279) Drop unused requests from TestHiveMetaStoreClientApiArgumentsChecker
[ https://issues.apache.org/jira/browse/HIVE-26279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis reassigned HIVE-26279: -- > Drop unused requests from TestHiveMetaStoreClientApiArgumentsChecker > > > Key: HIVE-26279 > URL: https://issues.apache.org/jira/browse/HIVE-26279 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Trivial > > Some tests in TestHiveMetaStoreClientApiArgumentsChecker are creating a > request but not really using them so it is basically dead code that can be > removed. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=776814&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776814 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 01/Jun/22 11:42 Start Date: 01/Jun/22 11:42 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r886705645 ## ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableOperation.java: ## @@ -99,7 +99,8 @@ public int execute() throws HiveException { createTableNonReplaceMode(tbl); } -DDLUtils.addIfAbsentByName(new WriteEntity(tbl, WriteEntity.WriteType.DDL_NO_LOCK), context); + DDLUtils.addIfAbsentByName(new WriteEntity(tbl, WriteEntity.WriteType.DDL_NO_LOCK), context); Review Comment: what changed here, extra space? Issue Time Tracking --- Worklog Id: (was: 776814) Time Spent: 1h 50m (was: 1h 40m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26278) Add unit tests for Hive#getPartitionsByNames using batching
[ https://issues.apache.org/jira/browse/HIVE-26278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-26278: --- Parent: HIVE-21637 Issue Type: Sub-task (was: Task) > Add unit tests for Hive#getPartitionsByNames using batching > --- > > Key: HIVE-26278 > URL: https://issues.apache.org/jira/browse/HIVE-26278 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > [Hive#getPartitionsByNames|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4155] > supports decomposing requests in batches but there are no unit tests > checking for the ValidWriteIdList when batching is used. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-25936) ValidWriteIdList & table id are sometimes missing when requesting partitions by name via HS2
[ https://issues.apache.org/jira/browse/HIVE-25936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544826#comment-17544826 ] Stamatis Zampetakis commented on HIVE-25936: New unit tests to be added as part of HIVE-26278 > ValidWriteIdList & table id are sometimes missing when requesting partitions > by name via HS2 > > > Key: HIVE-25936 > URL: https://issues.apache.org/jira/browse/HIVE-25936 > Project: Hive > Issue Type: Sub-task >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1 > > Time Spent: 40m > Remaining Estimate: 0h > > According to HIVE-24743 the table id and {{ValidWriteIdList}} are important > for keeping HMS remote metadata cache consistent. Although HIVE-24743 > attempted to pass the write id list and table id in every call to HMS it > failed to do so completely. For those partitions not handled in the batch > logic, the [metastore > call|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4161] > in {{Hive#getPartitionsByName}} method does not pass the table id and write > id list. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work started] (HIVE-26278) Add unit tests for Hive#getPartitionsByNames using batching
[ https://issues.apache.org/jira/browse/HIVE-26278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26278 started by Stamatis Zampetakis. -- > Add unit tests for Hive#getPartitionsByNames using batching > --- > > Key: HIVE-26278 > URL: https://issues.apache.org/jira/browse/HIVE-26278 > Project: Hive > Issue Type: Task > Components: HiveServer2 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > [Hive#getPartitionsByNames|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4155] > supports decomposing requests in batches but there are no unit tests > checking for the ValidWriteIdList when batching is used. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (HIVE-26278) Add unit tests for Hive#getPartitionsByNames using batching
[ https://issues.apache.org/jira/browse/HIVE-26278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis reassigned HIVE-26278: -- > Add unit tests for Hive#getPartitionsByNames using batching > --- > > Key: HIVE-26278 > URL: https://issues.apache.org/jira/browse/HIVE-26278 > Project: Hive > Issue Type: Task > Components: HiveServer2 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > [Hive#getPartitionsByNames|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4155] > supports decomposing requests in batches but there are no unit tests > checking for the ValidWriteIdList when batching is used. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25936) ValidWriteIdList & table id are sometimes missing when requesting partitions by name via HS2
[ https://issues.apache.org/jira/browse/HIVE-25936?focusedWorklogId=776800&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776800 ] ASF GitHub Bot logged work on HIVE-25936: - Author: ASF GitHub Bot Created on: 01/Jun/22 11:18 Start Date: 01/Jun/22 11:18 Worklog Time Spent: 10m Work Description: zabetak closed pull request #3007: HIVE-25936: ValidWriteIdList & table id are sometimes missing when requesting partitions by name via HS2 URL: https://github.com/apache/hive/pull/3007 Issue Time Tracking --- Worklog Id: (was: 776800) Time Spent: 40m (was: 0.5h) > ValidWriteIdList & table id are sometimes missing when requesting partitions > by name via HS2 > > > Key: HIVE-25936 > URL: https://issues.apache.org/jira/browse/HIVE-25936 > Project: Hive > Issue Type: Sub-task >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1 > > Time Spent: 40m > Remaining Estimate: 0h > > According to HIVE-24743 the table id and {{ValidWriteIdList}} are important > for keeping HMS remote metadata cache consistent. Although HIVE-24743 > attempted to pass the write id list and table id in every call to HMS it > failed to do so completely. For those partitions not handled in the batch > logic, the [metastore > call|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4161] > in {{Hive#getPartitionsByName}} method does not pass the table id and write > id list. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25936) ValidWriteIdList & table id are sometimes missing when requesting partitions by name via HS2
[ https://issues.apache.org/jira/browse/HIVE-25936?focusedWorklogId=776799&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776799 ] ASF GitHub Bot logged work on HIVE-25936: - Author: ASF GitHub Bot Created on: 01/Jun/22 11:18 Start Date: 01/Jun/22 11:18 Worklog Time Spent: 10m Work Description: zabetak commented on PR #3007: URL: https://github.com/apache/hive/pull/3007#issuecomment-1143474215 I am closing this PR down since the bug was fixed as part of HIVE-25935. I will open new PRs for the additional refactoring and the tests. Issue Time Tracking --- Worklog Id: (was: 776799) Time Spent: 0.5h (was: 20m) > ValidWriteIdList & table id are sometimes missing when requesting partitions > by name via HS2 > > > Key: HIVE-25936 > URL: https://issues.apache.org/jira/browse/HIVE-25936 > Project: Hive > Issue Type: Sub-task >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1 > > Time Spent: 0.5h > Remaining Estimate: 0h > > According to HIVE-24743 the table id and {{ValidWriteIdList}} are important > for keeping HMS remote metadata cache consistent. Although HIVE-24743 > attempted to pass the write id list and table id in every call to HMS it > failed to do so completely. For those partitions not handled in the batch > logic, the [metastore > call|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4161] > in {{Hive#getPartitionsByName}} method does not pass the table id and write > id list. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26095) Add queryid in QueryLifeTimeHookContext
[ https://issues.apache.org/jira/browse/HIVE-26095?focusedWorklogId=776793&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776793 ] ASF GitHub Bot logged work on HIVE-26095: - Author: ASF GitHub Bot Created on: 01/Jun/22 11:00 Start Date: 01/Jun/22 11:00 Worklog Time Spent: 10m Work Description: zabetak closed pull request #3156: HIVE-26095: Add queryid in QueryLifeTimeHookContext URL: https://github.com/apache/hive/pull/3156 Issue Time Tracking --- Worklog Id: (was: 776793) Time Spent: 2h 20m (was: 2h 10m) > Add queryid in QueryLifeTimeHookContext > --- > > Key: HIVE-26095 > URL: https://issues.apache.org/jira/browse/HIVE-26095 > Project: Hive > Issue Type: New Feature > Components: Hooks >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > A > [QueryLifeTimeHook|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHook.java] > is executed various times in the life-cycle of a query but it is not always > possible to obtain the id of the query. The query id is inside the > {{HookContext}} but the latter is not always available notably during > compilation. > The query id is useful for many purposes as it is the only way to uniquely > identify the query/command that is currently running. It is also the only way > to match together events appearing in before and after methods. > The goal of this jira is to add the query id in > [QueryLifeTimeHookContext|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHookContext.java] > and make it available during all life-cycle events. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HIVE-25936) ValidWriteIdList & table id are sometimes missing when requesting partitions by name via HS2
[ https://issues.apache.org/jira/browse/HIVE-25936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-25936. Fix Version/s: 4.0.0-alpha-1 Resolution: Fixed The bug reported here was fixed by the cleanup done in HIVE-25935 so I am marking this JIRA as fixed. The PR contains some useful refactoring and tests but I will track them down under new PRs/JIRAs. > ValidWriteIdList & table id are sometimes missing when requesting partitions > by name via HS2 > > > Key: HIVE-25936 > URL: https://issues.apache.org/jira/browse/HIVE-25936 > Project: Hive > Issue Type: Sub-task >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1 > > Time Spent: 20m > Remaining Estimate: 0h > > According to HIVE-24743 the table id and {{ValidWriteIdList}} are important > for keeping HMS remote metadata cache consistent. Although HIVE-24743 > attempted to pass the write id list and table id in every call to HMS it > failed to do so completely. For those partitions not handled in the batch > logic, the [metastore > call|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4161] > in {{Hive#getPartitionsByName}} method does not pass the table id and write > id list. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HIVE-26095) Add queryid in QueryLifeTimeHookContext
[ https://issues.apache.org/jira/browse/HIVE-26095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-26095. Resolution: Won't Fix As discussed under the [PR|https://github.com/apache/hive/pull/3156#discussion_r840394149] it is possible to obtain the query id via the Hive configuration (using {{hive.query.id}} property) so there is no need to introduce a new API for this purpose thus I am closing this JIRA as won't fix. > Add queryid in QueryLifeTimeHookContext > --- > > Key: HIVE-26095 > URL: https://issues.apache.org/jira/browse/HIVE-26095 > Project: Hive > Issue Type: New Feature > Components: Hooks >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > A > [QueryLifeTimeHook|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHook.java] > is executed various times in the life-cycle of a query but it is not always > possible to obtain the id of the query. The query id is inside the > {{HookContext}} but the latter is not always available notably during > compilation. > The query id is useful for many purposes as it is the only way to uniquely > identify the query/command that is currently running. It is also the only way > to match together events appearing in before and after methods. > The goal of this jira is to add the query id in > [QueryLifeTimeHookContext|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHookContext.java] > and make it available during all life-cycle events. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Closed] (HIVE-26095) Add queryid in QueryLifeTimeHookContext
[ https://issues.apache.org/jira/browse/HIVE-26095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis closed HIVE-26095. -- > Add queryid in QueryLifeTimeHookContext > --- > > Key: HIVE-26095 > URL: https://issues.apache.org/jira/browse/HIVE-26095 > Project: Hive > Issue Type: New Feature > Components: Hooks >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > A > [QueryLifeTimeHook|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHook.java] > is executed various times in the life-cycle of a query but it is not always > possible to obtain the id of the query. The query id is inside the > {{HookContext}} but the latter is not always available notably during > compilation. > The query id is useful for many purposes as it is the only way to uniquely > identify the query/command that is currently running. It is also the only way > to match together events appearing in before and after methods. > The goal of this jira is to add the query id in > [QueryLifeTimeHookContext|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHookContext.java] > and make it available during all life-cycle events. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26095) Add queryid in QueryLifeTimeHookContext
[ https://issues.apache.org/jira/browse/HIVE-26095?focusedWorklogId=776792&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776792 ] ASF GitHub Bot logged work on HIVE-26095: - Author: ASF GitHub Bot Created on: 01/Jun/22 11:00 Start Date: 01/Jun/22 11:00 Worklog Time Spent: 10m Work Description: zabetak commented on PR #3156: URL: https://github.com/apache/hive/pull/3156#issuecomment-1143454631 As discussed [previously](https://github.com/apache/hive/pull/3156#discussion_r840394149), there is no need to introduce a new API since it is possible to achieve the same result via the Hive configuration, so I am closing this PR. Issue Time Tracking --- Worklog Id: (was: 776792) Time Spent: 2h 10m (was: 2h) > Add queryid in QueryLifeTimeHookContext > --- > > Key: HIVE-26095 > URL: https://issues.apache.org/jira/browse/HIVE-26095 > Project: Hive > Issue Type: New Feature > Components: Hooks >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > A > [QueryLifeTimeHook|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHook.java] > is executed various times in the life-cycle of a query but it is not always > possible to obtain the id of the query. The query id is inside the > {{HookContext}} but the latter is not always available notably during > compilation. > The query id is useful for many purposes as it is the only way to uniquely > identify the query/command that is currently running. It is also the only way > to match together events appearing in before and after methods. > The goal of this jira is to add the query id in > [QueryLifeTimeHookContext|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHookContext.java] > and make it available during all life-cycle events. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26095) Add queryid in QueryLifeTimeHookContext
[ https://issues.apache.org/jira/browse/HIVE-26095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-26095: --- Fix Version/s: (was: 4.0.0-alpha-2) > Add queryid in QueryLifeTimeHookContext > --- > > Key: HIVE-26095 > URL: https://issues.apache.org/jira/browse/HIVE-26095 > Project: Hive > Issue Type: New Feature > Components: Hooks >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > A > [QueryLifeTimeHook|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHook.java] > is executed various times in the life-cycle of a query but it is not always > possible to obtain the id of the query. The query id is inside the > {{HookContext}} but the latter is not always available notably during > compilation. > The query id is useful for many purposes as it is the only way to uniquely > identify the query/command that is currently running. It is also the only way > to match together events appearing in before and after methods. > The goal of this jira is to add the query id in > [QueryLifeTimeHookContext|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHookContext.java] > and make it available during all life-cycle events. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26196) Integrate Sonar analysis for the master branch
[ https://issues.apache.org/jira/browse/HIVE-26196?focusedWorklogId=776772&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776772 ] ASF GitHub Bot logged work on HIVE-26196: - Author: ASF GitHub Bot Created on: 01/Jun/22 10:15 Start Date: 01/Jun/22 10:15 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #3254: URL: https://github.com/apache/hive/pull/3254#issuecomment-1143408741 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=3254) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3254&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3254&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3254&resolved=false&types=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=CODE_SMELL) [1 Code Smell](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=CODE_SMELL) [![0.0%](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/0-16px.png '0.0%')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3254&metric=new_coverage&view=list) [0.0% Coverage](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3254&metric=new_coverage&view=list) [![0.0%](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/3-16px.png '0.0%')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3254&metric=new_duplicated_lines_density&view=list) [0.0% Duplication](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3254&metric=new_duplicated_lines_density&view=list) Issue Time Tracking --- Worklog Id: (was: 776772) Time Spent: 50m (was: 40m) > Integrate Sonar analysis for the master branch > -- > > Key: HIVE-26196 > URL: https://issues.apache.org/jira/browse/HIVE-26196 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > The aim of the ticket is to integrate SonarCloud analysis for the master > branch. > The ticket does not cover: > * test coverage > * analysis on PRs and other branches > Those aspects can be added in follow-up tickets, if there is enough interest. > From preliminary tests, the analysis step requires 30 additional minutes for > the pipeline. > The idea for this first integration is to track code quality metrics over
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776771&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776771 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 10:13 Start Date: 01/Jun/22 10:13 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886633690 ## ql/src/java/org/apache/hadoop/hive/ql/security/authorization/HiveCustomStorageHandlerUtils.java: ## @@ -48,4 +54,13 @@ public static Map getTableProperties(Table table) { .ifPresent(tblProps::putAll); return tblProps; } + +public static Context.Operation operation(Configuration conf, String tableName) { Review Comment: So maybe another method like: `HiveCustomStorageHandlerUtils.setOperstion(hconf, tableName, operation)`? Issue Time Tracking --- Worklog Id: (was: 776771) Time Spent: 3h 40m (was: 3.5h) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) >
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776770&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776770 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 10:12 Start Date: 01/Jun/22 10:12 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886632668 ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -15005,6 +15004,12 @@ private AcidUtils.Operation getAcidType(String destination) { AcidUtils.Operation.INSERT); } + private Context.Operation getWriteOperation(String destination) { Review Comment: Is this reading the operation set by `HiveCustomStorageHandlerUtils.setWrite(hconf, tableName)`? Issue Time Tracking --- Worklog Id: (was: 776770) Time Spent: 3.5h (was: 3h 20m) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunP
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776768&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776768 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 10:09 Start Date: 01/Jun/22 10:09 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886629784 ## ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java: ## @@ -616,6 +617,8 @@ protected void initializeOp(Configuration hconf) throws HiveException { initializeSpecPath(); fs = specPath.getFileSystem(hconf); + hconf.set(WRITE_OPERATION_CONFIG_PREFIX + getConf().getTableInfo().getTableName(), Review Comment: Could we just do this like: `HiveCustomStorageHandlerUtils.setWrite(hconf, tableName)`? Issue Time Tracking --- Worklog Id: (was: 776768) Time Spent: 3h 20m (was: 3h 10m) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776738 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 09:29 Start Date: 01/Jun/22 09:29 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886591933 ## ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java: ## @@ -78,8 +78,8 @@ public void initialize(QueryState queryState, QueryPlan queryPlan, TaskQueue tas if (source instanceof TableScanOperator) { TableScanOperator ts = (TableScanOperator) source; // push down projections -ColumnProjectionUtils.appendReadColumns( -job, ts.getNeededColumnIDs(), ts.getNeededColumns(), ts.getNeededNestedColumnPaths()); +ColumnProjectionUtils.appendReadColumns(job, ts.getNeededColumnIDs(), ts.getNeededColumns(), +ts.getNeededNestedColumnPaths(), ts.conf.hasVirtualCols()); Review Comment: nit: Shall we expose `hasVirtualCols` on TSOperator instead of exposing and using the `conf`? Issue Time Tracking --- Worklog Id: (was: 776738) Time Spent: 3h 10m (was: 3h) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776736&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776736 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 09:28 Start Date: 01/Jun/22 09:28 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886590600 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java: ## @@ -549,4 +534,43 @@ private static Schema schemaWithoutConstantsAndMeta(Schema readSchema, Map implements CloseableIterator { Review Comment: nit: maybe move this class to the IcebergAcidUtil, so we do not have to use magic numbers, like `4`? Issue Time Tracking --- Worklog Id: (was: 776736) Time Spent: 3h (was: 2h 50m) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) > ... 15 mo
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776734&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776734 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 09:27 Start Date: 01/Jun/22 09:27 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886590600 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java: ## @@ -549,4 +534,43 @@ private static Schema schemaWithoutConstantsAndMeta(Schema readSchema, Map implements CloseableIterator { Review Comment: nit: maybe move this to the IcebergAcidUtil, so we do not have to use magic numbers, like `4`? Issue Time Tracking --- Worklog Id: (was: 776734) Time Spent: 2h 50m (was: 2h 40m) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) > ... 15
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776732&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776732 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 09:21 Start Date: 01/Jun/22 09:21 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886584030 ## ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java: ## @@ -739,6 +742,11 @@ protected void initializeOp(Configuration hconf) throws HiveException { } } + private void setWriteOperation(Configuration conf) { Review Comment: Would it make sense to keep the read/set part in the same class? Issue Time Tracking --- Worklog Id: (was: 776732) Time Spent: 2h 40m (was: 2.5h) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) > ... 15 more > Caused by: org.apach
[jira] [Resolved] (HIVE-25907) IOW Directory queries fails to write data to final path when query result cache is enabled
[ https://issues.apache.org/jira/browse/HIVE-25907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary resolved HIVE-25907. --- Resolution: Fixed Pushed to master. Thanks for the fix [~srahman]! > IOW Directory queries fails to write data to final path when query result > cache is enabled > -- > > Key: HIVE-25907 > URL: https://issues.apache.org/jira/browse/HIVE-25907 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > INSERT OVERWRITE DIRECTORY queries fails to write the data to the specified > directory location when query result cache is enabled. > *Steps to reproduce* > {code:java} > 1. create a data file with the following data > 1 abc 10.5 > 2 def 11.5 > 2. create table pointing to that data > create external table iowd(strct struct) > row format delimited > fields terminated by '\t' > collection items terminated by ' ' > location ''; > 3. run the following query > set hive.query.results.cache.enabled=true; > INSERT OVERWRITE DIRECTORY "" SELECT * FROM iowd; > {code} > After execution of the above query, It is expected that the destination > directory contains data from the table iowd, But due to HIVE-21386 it is not > happening anymore. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25907) IOW Directory queries fails to write data to final path when query result cache is enabled
[ https://issues.apache.org/jira/browse/HIVE-25907?focusedWorklogId=776729&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776729 ] ASF GitHub Bot logged work on HIVE-25907: - Author: ASF GitHub Bot Created on: 01/Jun/22 09:18 Start Date: 01/Jun/22 09:18 Worklog Time Spent: 10m Work Description: pvary merged PR #2978: URL: https://github.com/apache/hive/pull/2978 Issue Time Tracking --- Worklog Id: (was: 776729) Time Spent: 4h 20m (was: 4h 10m) > IOW Directory queries fails to write data to final path when query result > cache is enabled > -- > > Key: HIVE-25907 > URL: https://issues.apache.org/jira/browse/HIVE-25907 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > INSERT OVERWRITE DIRECTORY queries fails to write the data to the specified > directory location when query result cache is enabled. > *Steps to reproduce* > {code:java} > 1. create a data file with the following data > 1 abc 10.5 > 2 def 11.5 > 2. create table pointing to that data > create external table iowd(strct struct) > row format delimited > fields terminated by '\t' > collection items terminated by ' ' > location ''; > 3. run the following query > set hive.query.results.cache.enabled=true; > INSERT OVERWRITE DIRECTORY "" SELECT * FROM iowd; > {code} > After execution of the above query, It is expected that the destination > directory contains data from the table iowd, But due to HIVE-21386 it is not > happening anymore. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776726&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776726 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 09:07 Start Date: 01/Jun/22 09:07 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886570597 ## ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java: ## @@ -739,6 +742,11 @@ protected void initializeOp(Configuration hconf) throws HiveException { } } + private void setWriteOperation(Configuration conf) { Review Comment: Moved the read part to InputFormatConfig.java, but it is still set in FileSinkOperator. Issue Time Tracking --- Worklog Id: (was: 776726) Time Spent: 2.5h (was: 2h 20m) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) > ... 15 m
[jira] [Updated] (HIVE-26277) Add unit tests for ColumnStatsAggregator classes
[ https://issues.apache.org/jira/browse/HIVE-26277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26277: Component/s: Standalone Metastore > Add unit tests for ColumnStatsAggregator classes > > > Key: HIVE-26277 > URL: https://issues.apache.org/jira/browse/HIVE-26277 > Project: Hive > Issue Type: Test > Components: Standalone Metastore, Statistics, Tests >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > We have no unit tests covering these classes, which also happen to contain > some complicated logic, making the absence of tests even more risky. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work started] (HIVE-26277) Add unit tests for ColumnStatsAggregator classes
[ https://issues.apache.org/jira/browse/HIVE-26277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26277 started by Alessandro Solimando. --- > Add unit tests for ColumnStatsAggregator classes > > > Key: HIVE-26277 > URL: https://issues.apache.org/jira/browse/HIVE-26277 > Project: Hive > Issue Type: Test > Components: Statistics, Tests >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > We have no unit tests covering these classes, which also happen to contain > some complicated logic, making the absence of tests even more risky. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776719&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776719 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 08:38 Start Date: 01/Jun/22 08:38 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886542319 ## ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java: ## @@ -932,7 +940,9 @@ protected void createBucketForFileIdx(FSPaths fsp, int filesIdx) && !FileUtils.mkdir(fs, outPath.getParent(), hconf)) { LOG.warn("Unable to create directory with inheritPerms: " + outPath); } -fsp.outWriters[filesIdx] = HiveFileFormatUtils.getHiveRecordWriter(jc, conf.getTableInfo(), +JobConf jobConf = new JobConf(jc); +setWriteOperation(jobConf); +fsp.outWriters[filesIdx] = HiveFileFormatUtils.getHiveRecordWriter(jobConf, conf.getTableInfo(), Review Comment: Changing the method signature would alters all file formats Hive supports Issue Time Tracking --- Worklog Id: (was: 776719) Time Spent: 2h 20m (was: 2h 10m) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.proce
[jira] [Assigned] (HIVE-26277) Add unit tests for ColumnStatsAggregator classes
[ https://issues.apache.org/jira/browse/HIVE-26277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando reassigned HIVE-26277: --- > Add unit tests for ColumnStatsAggregator classes > > > Key: HIVE-26277 > URL: https://issues.apache.org/jira/browse/HIVE-26277 > Project: Hive > Issue Type: Test > Components: Statistics, Tests >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > > We have no unit tests covering these classes, which also happen to contain > some complicated logic, making the absence of tests even more risky. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776717&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776717 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 08:36 Start Date: 01/Jun/22 08:36 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886540554 ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -15005,6 +15004,12 @@ private AcidUtils.Operation getAcidType(String destination) { AcidUtils.Operation.INSERT); } + private Context.Operation getWriteOperation(String destination) { Review Comment: The value in `destination` has a changing part and it is can not be mapped easily to an enum constant Issue Time Tracking --- Worklog Id: (was: 776717) Time Spent: 2h 10m (was: 2h) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initiali
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776716&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776716 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 08:35 Start Date: 01/Jun/22 08:35 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886538949 ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -11433,6 +11435,7 @@ private Operator genTablePlan(String alias, QB qb) throws SemanticException { // Determine row schema for TSOP. // Include column names from SerDe, the partition and virtual columns. rwsch = new RowResolver(); + Review Comment: reverted Issue Time Tracking --- Worklog Id: (was: 776716) Time Spent: 2h (was: 1h 50m) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) >
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776714&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776714 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 08:33 Start Date: 01/Jun/22 08:33 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886537498 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java: ## @@ -259,22 +258,27 @@ public void initialize(InputSplit split, TaskAttemptContext newContext) { this.inMemoryDataModel = conf.getEnum(InputFormatConfig.IN_MEMORY_DATA_MODEL, InputFormatConfig.InMemoryDataModel.GENERIC); this.currentIterator = open(tasks.next(), expectedSchema).iterator(); - Operation operation = HiveIcebergStorageHandler.operation(conf, conf.get(Catalogs.NAME)); - this.updateOrDelete = Operation.DELETE.equals(operation) || Operation.UPDATE.equals(operation); + this.fetchVirtualColumns = InputFormatConfig.fetchVirtualColumns(conf); } @Override public boolean nextKeyValue() throws IOException { while (true) { if (currentIterator.hasNext()) { current = currentIterator.next(); - if (updateOrDelete) { + if (fetchVirtualColumns) { GenericRecord rec = (GenericRecord) current; PositionDeleteInfo.setIntoConf(conf, IcebergAcidUtil.parseSpecId(rec), IcebergAcidUtil.computePartitionHash(rec), IcebergAcidUtil.parseFilePath(rec), IcebergAcidUtil.parseFilePosition(rec)); +GenericRecord tmp = GenericRecord.create( Review Comment: Created a separate class to handle and wrap this. `VirtualColumnAwareIterator` Issue Time Tracking --- Worklog Id: (was: 776714) Time Spent: 1h 50m (was: 1h 40m) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunne
[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand
[ https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776711&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776711 ] ASF GitHub Bot logged work on HIVE-26264: - Author: ASF GitHub Bot Created on: 01/Jun/22 08:32 Start Date: 01/Jun/22 08:32 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #3324: URL: https://github.com/apache/hive/pull/3324#discussion_r886536461 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java: ## @@ -259,22 +258,27 @@ public void initialize(InputSplit split, TaskAttemptContext newContext) { this.inMemoryDataModel = conf.getEnum(InputFormatConfig.IN_MEMORY_DATA_MODEL, InputFormatConfig.InMemoryDataModel.GENERIC); this.currentIterator = open(tasks.next(), expectedSchema).iterator(); - Operation operation = HiveIcebergStorageHandler.operation(conf, conf.get(Catalogs.NAME)); - this.updateOrDelete = Operation.DELETE.equals(operation) || Operation.UPDATE.equals(operation); + this.fetchVirtualColumns = InputFormatConfig.fetchVirtualColumns(conf); } @Override public boolean nextKeyValue() throws IOException { while (true) { if (currentIterator.hasNext()) { current = currentIterator.next(); - if (updateOrDelete) { + if (fetchVirtualColumns) { GenericRecord rec = (GenericRecord) current; PositionDeleteInfo.setIntoConf(conf, IcebergAcidUtil.parseSpecId(rec), IcebergAcidUtil.computePartitionHash(rec), IcebergAcidUtil.parseFilePath(rec), IcebergAcidUtil.parseFilePosition(rec)); +GenericRecord tmp = GenericRecord.create( +new Schema(expectedSchema.columns().subList(4, expectedSchema.columns().size(; +for (int i = 4; i < expectedSchema.columns().size(); ++i) { Review Comment: Moved Issue Time Tracking --- Worklog Id: (was: 776711) Time Spent: 1h 40m (was: 1.5h) > Iceberg integration: Fetch virtual columns on demand > > > Key: HIVE-26264 > URL: https://issues.apache.org/jira/browse/HIVE-26264 > Project: Hive > Issue Type: Bug > Components: File Formats >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently virtual columns are fetched from iceberg tables if the statement > being executed is a delete or update statement and the setting is global. It > means it affects all tables affected by the statement. Also the read and > write schema depends on the operation setting. > Some statements fails due to invalid schema: > {code} > create external table tbl_ice(a int, b string, c int) stored by iceberg > stored as orc tblproperties ('format-version'='2'); > insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), > (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56); > update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4); > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, > vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task > failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt > 0 failed, info=[Error: Error while running task ( failure ) : > attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2
[jira] [Work logged] (HIVE-25907) IOW Directory queries fails to write data to final path when query result cache is enabled
[ https://issues.apache.org/jira/browse/HIVE-25907?focusedWorklogId=776704&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776704 ] ASF GitHub Bot logged work on HIVE-25907: - Author: ASF GitHub Bot Created on: 01/Jun/22 07:51 Start Date: 01/Jun/22 07:51 Worklog Time Spent: 10m Work Description: shameersss1 commented on PR #2978: URL: https://github.com/apache/hive/pull/2978#issuecomment-1143238349 @pvary - Thanks for the review. Are we good to merge this PR? Issue Time Tracking --- Worklog Id: (was: 776704) Time Spent: 4h 10m (was: 4h) > IOW Directory queries fails to write data to final path when query result > cache is enabled > -- > > Key: HIVE-25907 > URL: https://issues.apache.org/jira/browse/HIVE-25907 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > INSERT OVERWRITE DIRECTORY queries fails to write the data to the specified > directory location when query result cache is enabled. > *Steps to reproduce* > {code:java} > 1. create a data file with the following data > 1 abc 10.5 > 2 def 11.5 > 2. create table pointing to that data > create external table iowd(strct struct) > row format delimited > fields terminated by '\t' > collection items terminated by ' ' > location ''; > 3. run the following query > set hive.query.results.cache.enabled=true; > INSERT OVERWRITE DIRECTORY "" SELECT * FROM iowd; > {code} > After execution of the above query, It is expected that the destination > directory contains data from the table iowd, But due to HIVE-21386 it is not > happening anymore. -- This message was sent by Atlassian Jira (v8.20.7#820007)