date:20220601

[jira] [Resolved] (HIVE-26274) No vectorization if query has upper case window function

2022-06-01 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-26274.
---
Resolution: Fixed

Pushed to master. Thanks [~abstractdog] for review.

> No vectorization if query has upper case window function
> 
>
> Key: HIVE-26274
> URL: https://issues.apache.org/jira/browse/HIVE-26274
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE t1 (a int, b int);
> EXPLAIN VECTORIZATION ONLY SELECT ROW_NUMBER() OVER(order by a) AS rn FROM t1;
> {code}
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   Vertices:
> Map 1 
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vector.serde.deserialize IS true
> inputFormatFeatureSupport: [DECIMAL_64]
> featureSupportInUse: [DECIMAL_64]
> inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
> allNative: true
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2 
> Execution mode: llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez] IS true
> notVectorizedReason: PTF operator: ROW_NUMBER not in 
> supported functions [avg, count, dense_rank, first_value, lag, last_value, 
> lead, max, min, rank, row_number, sum]
> vectorized: false
>   Stage: Stage-0
> Fetch Operator
> {code}
> {code}
> notVectorizedReason: PTF operator: ROW_NUMBER not in 
> supported functions [avg, count, dense_rank, first_value, lag, last_value, 
> lead, max, min, rank, row_number, sum]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26274) No vectorization if query has upper case window function

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26274?focusedWorklogId=777296&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777296
 ]

ASF GitHub Bot logged work on HIVE-26274:
-

Author: ASF GitHub Bot
Created on: 02/Jun/22 06:54
Start Date: 02/Jun/22 06:54
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged PR #3332:
URL: https://github.com/apache/hive/pull/3332




Issue Time Tracking
---

Worklog Id: (was: 777296)
Time Spent: 0.5h  (was: 20m)

> No vectorization if query has upper case window function
> 
>
> Key: HIVE-26274
> URL: https://issues.apache.org/jira/browse/HIVE-26274
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE t1 (a int, b int);
> EXPLAIN VECTORIZATION ONLY SELECT ROW_NUMBER() OVER(order by a) AS rn FROM t1;
> {code}
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   Vertices:
> Map 1 
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vector.serde.deserialize IS true
> inputFormatFeatureSupport: [DECIMAL_64]
> featureSupportInUse: [DECIMAL_64]
> inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
> allNative: true
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2 
> Execution mode: llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez] IS true
> notVectorizedReason: PTF operator: ROW_NUMBER not in 
> supported functions [avg, count, dense_rank, first_value, lag, last_value, 
> lead, max, min, rank, row_number, sum]
> vectorized: false
>   Stage: Stage-0
> Fetch Operator
> {code}
> {code}
> notVectorizedReason: PTF operator: ROW_NUMBER not in 
> supported functions [avg, count, dense_rank, first_value, lag, last_value, 
> lead, max, min, rank, row_number, sum]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26274) No vectorization if query has upper case window function

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26274?focusedWorklogId=777293&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777293
 ]

ASF GitHub Bot logged work on HIVE-26274:
-

Author: ASF GitHub Bot
Created on: 02/Jun/22 06:46
Start Date: 02/Jun/22 06:46
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on PR #3332:
URL: https://github.com/apache/hive/pull/3332#issuecomment-1144501074

   LGTM, thanks for the patch @kasakrisz 




Issue Time Tracking
---

Worklog Id: (was: 777293)
Time Spent: 20m  (was: 10m)

> No vectorization if query has upper case window function
> 
>
> Key: HIVE-26274
> URL: https://issues.apache.org/jira/browse/HIVE-26274
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE t1 (a int, b int);
> EXPLAIN VECTORIZATION ONLY SELECT ROW_NUMBER() OVER(order by a) AS rn FROM t1;
> {code}
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   Vertices:
> Map 1 
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vector.serde.deserialize IS true
> inputFormatFeatureSupport: [DECIMAL_64]
> featureSupportInUse: [DECIMAL_64]
> inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
> allNative: true
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2 
> Execution mode: llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez] IS true
> notVectorizedReason: PTF operator: ROW_NUMBER not in 
> supported functions [avg, count, dense_rank, first_value, lag, last_value, 
> lead, max, min, rank, row_number, sum]
> vectorized: false
>   Stage: Stage-0
> Fetch Operator
> {code}
> {code}
> notVectorizedReason: PTF operator: ROW_NUMBER not in 
> supported functions [avg, count, dense_rank, first_value, lag, last_value, 
> lead, max, min, rank, row_number, sum]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26285) Overwrite database metadata on original source in optimised failover.

2022-06-01 Thread Haymant Mangla (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haymant Mangla updated HIVE-26285:
--
Parent: HIVE-25699
Issue Type: Sub-task  (was: Bug)

> Overwrite database metadata on original source in optimised failover.
> -
>
> Key: HIVE-26285
> URL: https://issues.apache.org/jira/browse/HIVE-26285
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Assigned] (HIVE-26285) Overwrite database metadata on original source in optimised failover.

2022-06-01 Thread Haymant Mangla (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haymant Mangla reassigned HIVE-26285:
-


> Overwrite database metadata on original source in optimised failover.
> -
>
> Key: HIVE-26285
> URL: https://issues.apache.org/jira/browse/HIVE-26285
> Project: Hive
>  Issue Type: Bug
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-21160) Rewrite Update statement as Multi-insert and do Update split early

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21160?focusedWorklogId=777258&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777258
 ]

ASF GitHub Bot logged work on HIVE-21160:
-

Author: ASF GitHub Bot
Created on: 02/Jun/22 02:42
Start Date: 02/Jun/22 02:42
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #2855:
URL: https://github.com/apache/hive/pull/2855#discussion_r877908690


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -3392,11 +3392,19 @@ public static enum ConfVars {
 MERGE_CARDINALITY_VIOLATION_CHECK("hive.merge.cardinality.check", true,
   "Set to true to ensure that each SQL Merge statement ensures that for 
each row in the target\n" +
 "table there is at most 1 matching row in the source table per SQL 
Specification."),
+SPLIT_UPDATE("hive.split.update", true,

Review Comment:
   Updating larger datasets the split update can perform better and this is 
also a precondition to enable updating partition and bucketing keys.





Issue Time Tracking
---

Worklog Id: (was: 777258)
Time Spent: 3h 20m  (was: 3h 10m)

> Rewrite Update statement as Multi-insert and do Update split early
> --
>
> Key: HIVE-21160
> URL: https://issues.apache.org/jira/browse/HIVE-21160
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-21160) Rewrite Update statement as Multi-insert and do Update split early

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21160?focusedWorklogId=777254&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777254
 ]

ASF GitHub Bot logged work on HIVE-21160:
-

Author: ASF GitHub Bot
Created on: 02/Jun/22 02:32
Start Date: 02/Jun/22 02:32
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #2855:
URL: https://github.com/apache/hive/pull/2855#discussion_r887446340


##
ql/src/java/org/apache/hadoop/hive/ql/parse/MergeSemanticAnalyzer.java:
##
@@ -470,8 +405,7 @@ private String handleUpdate(ASTNode 
whenMatchedUpdateClause, StringBuilder rewri
   rewrittenQueryStr.append(" AND 
NOT(").append(deleteExtraPredicate).append(")");
 }
 if(!splitUpdateEarly) {
-  rewrittenQueryStr.append("\n SORT BY ");
-  rewrittenQueryStr.append(targetName).append(".ROW__ID ");
+  appendSortBy(rewrittenQueryStr, Collections.singletonList(targetName + 
".ROW__ID "));
 }
 rewrittenQueryStr.append("\n");

Review Comment:
   Added logging rewritten AST in RewriteSA before passing it to super.analyze.





Issue Time Tracking
---

Worklog Id: (was: 777254)
Time Spent: 3h 10m  (was: 3h)

> Rewrite Update statement as Multi-insert and do Update split early
> --
>
> Key: HIVE-21160
> URL: https://issues.apache.org/jira/browse/HIVE-21160
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (HIVE-21304) Make bucketing version usage more robust

2022-06-01 Thread katty he (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-21304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545226#comment-17545226
 ] 

katty he commented on HIVE-21304:
-

this patch cannot be used on hive 3.1.2 version? when i use this patch on hive 
3.1.2, the test testMergeOnTezEdges cannot be passed, is there some other patch 
i should pick?

> Make bucketing version usage more robust
> 
>
> Key: HIVE-21304
> URL: https://issues.apache.org/jira/browse/HIVE-21304
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Zoltan Haindrich
>Priority: Major
> Fix For: 4.0.0, 4.0.0-alpha-1
>
> Attachments: HIVE-21304.01.patch, HIVE-21304.02.patch, 
> HIVE-21304.03.patch, HIVE-21304.04.patch, HIVE-21304.05.patch, 
> HIVE-21304.06.patch, HIVE-21304.07.patch, HIVE-21304.08.patch, 
> HIVE-21304.09.patch, HIVE-21304.10.patch, HIVE-21304.11.patch, 
> HIVE-21304.12.patch, HIVE-21304.13.patch, HIVE-21304.14.patch, 
> HIVE-21304.15.patch, HIVE-21304.16.patch, HIVE-21304.17.patch, 
> HIVE-21304.18.patch, HIVE-21304.19.patch, HIVE-21304.20.patch, 
> HIVE-21304.21.patch, HIVE-21304.22.patch, HIVE-21304.23.patch, 
> HIVE-21304.24.patch, HIVE-21304.25.patch, HIVE-21304.26.patch, 
> HIVE-21304.27.patch, HIVE-21304.28.patch, HIVE-21304.29.patch, 
> HIVE-21304.30.patch, HIVE-21304.31.patch, HIVE-21304.32.patch, 
> HIVE-21304.33.patch, HIVE-21304.33.patch, HIVE-21304.33.patch, 
> HIVE-21304.34.patch, HIVE-21304.34.patch, HIVE-21304.35.patch, 
> HIVE-21304.35.patch, HIVE-21304.36.patch, HIVE-21304.37.patch, 
> HIVE-21304.38.patch, HIVE-21304.38.patch, HIVE-21304.38.patch, 
> HIVE-21304.39.patch, HIVE-21304.40.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> * Show Bucketing version for ReduceSinkOp in explain extended plan - this 
> helps identify what hashing algorithm is being used by by ReduceSinkOp.
> * move the actually selected version to the "conf" so that it doesn't get lost
> * replace trait related logic with a separate optimizer rule
> * do version selection based on a group of operator - this is more reliable
> * skip bucketingversion selection for tables with 1 buckets
> * prefer to use version 2 if possible
> * fix operator creations which didn't set a new conf



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (HIVE-26230) Option to URL encode special chars in hbase.column.mapping that are valid HBase column family chars

2022-06-01 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-26230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita resolved HIVE-26230.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Committed to master. Thanks for reviewing [~pvary] 

> Option to URL encode special chars in hbase.column.mapping that are valid 
> HBase column family chars
> ---
>
> Key: HIVE-26230
> URL: https://issues.apache.org/jira/browse/HIVE-26230
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-26015 and HIVE-26139 aimed to fix a missing special character handling 
> of values provided for hbase.column.mapping. Values here are used as an URL 
> for Ranger based authentication and special characters need to be URL encoded 
> for this feature.
> This is currently done only for # char. We should handle all special 
> characters that are valid HBase column family characters but count as special 
> characters for URLs.
> The URL encoding of HIVE-26015 should come back, as in HBase we can have 
> almost any characters for column family (excluding : / ). To make this a 
> backward-compatible change, the URL encoding will essentially be optional, so 
> users won't have to make changes to their working environment. Should they 
> encounter a special character in their HBase table definition though, they 
> can turn this URL encoding feature on, which in turn comes with the 
> requirement from their end to update their Ranger policies so they are in URL 
> encoded format for these tables.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26230) Option to URL encode special chars in hbase.column.mapping that are valid HBase column family chars

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26230?focusedWorklogId=777149&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777149
 ]

ASF GitHub Bot logged work on HIVE-26230:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 20:41
Start Date: 01/Jun/22 20:41
Worklog Time Spent: 10m 
  Work Description: szlta merged PR #3314:
URL: https://github.com/apache/hive/pull/3314




Issue Time Tracking
---

Worklog Id: (was: 777149)
Time Spent: 20m  (was: 10m)

> Option to URL encode special chars in hbase.column.mapping that are valid 
> HBase column family chars
> ---
>
> Key: HIVE-26230
> URL: https://issues.apache.org/jira/browse/HIVE-26230
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-26015 and HIVE-26139 aimed to fix a missing special character handling 
> of values provided for hbase.column.mapping. Values here are used as an URL 
> for Ranger based authentication and special characters need to be URL encoded 
> for this feature.
> This is currently done only for # char. We should handle all special 
> characters that are valid HBase column family characters but count as special 
> characters for URLs.
> The URL encoding of HIVE-26015 should come back, as in HBase we can have 
> almost any characters for column family (excluding : / ). To make this a 
> backward-compatible change, the URL encoding will essentially be optional, so 
> users won't have to make changes to their working environment. Should they 
> encounter a special character in their HBase table definition though, they 
> can turn this URL encoding feature on, which in turn comes with the 
> requirement from their end to update their Ranger policies so they are in URL 
> encoded format for these tables.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=777136&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777136
 ]

ASF GitHub Bot logged work on HIVE-26244:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 19:59
Start Date: 01/Jun/22 19:59
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on code in PR #3307:
URL: https://github.com/apache/hive/pull/3307#discussion_r887249049


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableOperation.java:
##
@@ -99,7 +99,8 @@ public int execute() throws HiveException {
   createTableNonReplaceMode(tbl);
 }
 
-DDLUtils.addIfAbsentByName(new WriteEntity(tbl, 
WriteEntity.WriteType.DDL_NO_LOCK), context);
+  DDLUtils.addIfAbsentByName(new WriteEntity(tbl, 
WriteEntity.WriteType.DDL_NO_LOCK), context);

Review Comment:
   Removed populating outputs in CreateTableOperation from the previous commit 
and retained it only in SemanticAnalyze. So, when removing the previous code, 
extra space has crept in. 
   Will remove the extra line and space.





Issue Time Tracking
---

Worklog Id: (was: 777136)
Time Spent: 2h  (was: 1h 50m)

> Implementing locking for concurrent ctas
> 
>
> Key: HIVE-26244
> URL: https://issues.apache.org/jira/browse/HIVE-26244
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri G
>Assignee: Simhadri G
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (HIVE-26272) Inline util code that is used from log4j jar

2022-06-01 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-26272.
-
Fix Version/s: 4.0.0-alpha-2
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Inline util code that is used from log4j jar
> 
>
> Key: HIVE-26272
> URL: https://issues.apache.org/jira/browse/HIVE-26272
> Project: Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Affects Versions: 3.1.3
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> See https://issues.apache.org/jira/browse/DRILL-8240 and related issues for 
> background.
> HiveServer2 uses log4j Strings class for a isBlank method.
> I can add a PR to inline this code.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (HIVE-26272) Inline util code that is used from log4j jar

2022-06-01 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545063#comment-17545063
 ] 

Ayush Saxena commented on HIVE-26272:
-

Merged PR to master.

Thanx [~pj.fanning] for the contribution!!!

> Inline util code that is used from log4j jar
> 
>
> Key: HIVE-26272
> URL: https://issues.apache.org/jira/browse/HIVE-26272
> Project: Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Affects Versions: 3.1.3
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> See https://issues.apache.org/jira/browse/DRILL-8240 and related issues for 
> background.
> HiveServer2 uses log4j Strings class for a isBlank method.
> I can add a PR to inline this code.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26272) Inline util code that is used from log4j jar

2022-06-01 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-26272:

Summary: Inline util code that is used from log4j jar  (was: inline log4j 
util code used in HiveServer2.java)

> Inline util code that is used from log4j jar
> 
>
> Key: HIVE-26272
> URL: https://issues.apache.org/jira/browse/HIVE-26272
> Project: Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Affects Versions: 3.1.3
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> See https://issues.apache.org/jira/browse/DRILL-8240 and related issues for 
> background.
> HiveServer2 uses log4j Strings class for a isBlank method.
> I can add a PR to inline this code.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26272) inline log4j util code used in HiveServer2.java

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26272?focusedWorklogId=777054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777054
 ]

ASF GitHub Bot logged work on HIVE-26272:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 17:48
Start Date: 01/Jun/22 17:48
Worklog Time Spent: 10m 
  Work Description: ayushtkn merged PR #3330:
URL: https://github.com/apache/hive/pull/3330




Issue Time Tracking
---

Worklog Id: (was: 777054)
Time Spent: 40m  (was: 0.5h)

> inline log4j util code used in HiveServer2.java
> ---
>
> Key: HIVE-26272
> URL: https://issues.apache.org/jira/browse/HIVE-26272
> Project: Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Affects Versions: 3.1.3
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> See https://issues.apache.org/jira/browse/DRILL-8240 and related issues for 
> background.
> HiveServer2 uses log4j Strings class for a isBlank method.
> I can add a PR to inline this code.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26196) Integrate Sonar analysis for the master branch and PRs

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26196?focusedWorklogId=777009&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777009
 ]

ASF GitHub Bot logged work on HIVE-26196:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 16:49
Start Date: 01/Jun/22 16:49
Worklog Time Spent: 10m 
  Work Description: asolimando commented on PR #3254:
URL: https://github.com/apache/hive/pull/3254#issuecomment-1143867392

   @kgyrtkirk can you have another look? At the moment we are analyzing PRs and 
the master branch, I don't think we need to support labels or anything, because 
whatever change we want to make has to pass through a PR, we will spot eventual 
issues before merging, so I think we can stick with just analyzing the master 
branch. WDYT?
   
   PS: I have updated the title of the Jira ticket, the message of the first 
commit is outdated, I am not amending it because otherwise it will trigger CI 
again for nothing.




Issue Time Tracking
---

Worklog Id: (was: 777009)
Time Spent: 1h 10m  (was: 1h)

> Integrate Sonar analysis for the master branch and PRs
> --
>
> Key: HIVE-26196
> URL: https://issues.apache.org/jira/browse/HIVE-26196
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The aim of the ticket is to integrate SonarCloud analysis for the master 
> branch and PRs.
> The ticket does not cover test coverage at the moment (it can be added in 
> follow-up tickets, if there is enough interest).
> From preliminary tests, the analysis step requires 30 additional minutes for 
> the pipeline, but this step is run in parallel with the test run, so the 
> total end-to-end run-time is not affected.
> The idea for this first integration is to track code quality metrics over new 
> commits in the master branch and for PRs, without any quality gate rules 
> (i.e., the analysis will never fail, independently of the values of the 
> quality metrics).
> An example of analysis is available in the ASF Sonar account for Hive: [PR 
> analysis|https://sonarcloud.io/summary/new_code?id=apache_hive&pullRequest=3254]
> After integrating the changes, PRs will also be decorated with a link to the 
> analysis to be able to better evaluate any pain points of the contribution at 
> an earlier stage, making the life of the reviewers a bit easier.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26270) Wrong timestamps when reading Hive 3.1.x Parquet files with vectorized reader

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26270:
--
Labels: compatibility pull-request-available timestamp  (was: compatibility 
timestamp)

> Wrong timestamps when reading Hive 3.1.x Parquet files with vectorized reader
> -
>
> Key: HIVE-26270
> URL: https://issues.apache.org/jira/browse/HIVE-26270
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Parquet
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: compatibility, pull-request-available, timestamp
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Parquet files written in Hive 3.1.x onwards with timezone set to US/Pacific.
> {code:sql}
> CREATE TABLE employee (eid INT, birth timestamp) STORED AS PARQUET;
> INSERT INTO employee VALUES 
> (1, '1880-01-01 00:00:00'),
> (2, '1884-01-01 00:00:00'),
> (3, '1990-01-01 00:00:00');
> {code}
> Parquet files read with Hive 4.0.0-apha-1 onwards.
> +Without vectorization+ results are correct.
> {code:sql}
> SELECT * FROM employee;
> {code}
> {noformat}
> 1 1880-01-01 00:00:00
> 2 1884-01-01 00:00:00
> 3 1990-01-01 00:00:00
> {noformat}
> +With vectorization+ some timestamps are shifted.
> {code:sql}
> -- Disable fetch task conversion to force vectorization kick in
> set hive.fetch.task.conversion=none;
> SELECT * FROM employee;
> {code}
> {noformat}
> 1 1879-12-31 23:52:58
> 2 1884-01-01 00:00:00
> 3 1990-01-01 00:00:00
> {noformat}
> The problem is the same reported under HIVE-24074. The data were written 
> using the new Date/Time APIs (java.time) in version Hive 3.1.3 and here they 
> were read using the old APIs (java.sql).
> The difference with HIVE-24074 is that here the problem appears only for 
> vectorized execution while the non-vectorized reader is working fine so there 
> is some *inconsistency in the behavior* of vectorized and non vectorized 
> readers.
> Non-vectorized reader works fine cause it derives automatically that it 
> should use the new JDK APIs to read back the timestamp value. This is 
> possible in this case cause there are metadata information in the file (i.e., 
> the presence of {{{}writer.time.zone{}}}) from where it can infer that the 
> timestamps were written using the new Date/Time APIs.
> The inconsistent behavior between vectorized and non-vectorized reader is a 
> regression caused by HIVE-25104. This JIRA is an attempt to re-align the 
> behavior between vectorized and non-vectorized readers.
> Note that if the file metadata are empty both vectorized and non-vectorized 
> reader cannot determine which APIs to use for the conversion and in this case 
> it is necessary the user to set the
> {{hive.parquet.timestamp.legacy.conversion.enabled}} explicitly to get back 
> the correct results.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26270) Wrong timestamps when reading Hive 3.1.x Parquet files with vectorized reader

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26270?focusedWorklogId=776969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776969
 ]

ASF GitHub Bot logged work on HIVE-26270:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 15:49
Start Date: 01/Jun/22 15:49
Worklog Time Spent: 10m 
  Work Description: zabetak opened a new pull request, #3338:
URL: https://github.com/apache/hive/pull/3338

   ### What changes were proposed in this pull request?
   1. Extract legacy conversion derivation logic based on file metadata and 
configuration into separate method.
   2. Use the same logic for determining the conversion in both vectorized and 
non-vectorized Parquet readers by exploiting the new method.
   
   ### Why are the changes needed?
   1. Remedy "wrong" results when using the vectorized reader
   2. Align behavior between vectorized/non-vectorized code
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, result of the queries may be affected.
   
   ### How was this patch tested?
   `mvn test -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile=parquet_timestamp_int96_compatibility_hive3_1_3.q`
   
   Compare wrong results in 
https://github.com/apache/hive/commit/5a1512ccf1619d744e65aa1a882326cb9df60dd8 
with correct results 
https://github.com/apache/hive/commit/e38b4ec868043e897ca2cc9da8b40a4742cb4757




Issue Time Tracking
---

Worklog Id: (was: 776969)
Remaining Estimate: 0h
Time Spent: 10m

> Wrong timestamps when reading Hive 3.1.x Parquet files with vectorized reader
> -
>
> Key: HIVE-26270
> URL: https://issues.apache.org/jira/browse/HIVE-26270
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Parquet
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: compatibility, timestamp
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Parquet files written in Hive 3.1.x onwards with timezone set to US/Pacific.
> {code:sql}
> CREATE TABLE employee (eid INT, birth timestamp) STORED AS PARQUET;
> INSERT INTO employee VALUES 
> (1, '1880-01-01 00:00:00'),
> (2, '1884-01-01 00:00:00'),
> (3, '1990-01-01 00:00:00');
> {code}
> Parquet files read with Hive 4.0.0-apha-1 onwards.
> +Without vectorization+ results are correct.
> {code:sql}
> SELECT * FROM employee;
> {code}
> {noformat}
> 1 1880-01-01 00:00:00
> 2 1884-01-01 00:00:00
> 3 1990-01-01 00:00:00
> {noformat}
> +With vectorization+ some timestamps are shifted.
> {code:sql}
> -- Disable fetch task conversion to force vectorization kick in
> set hive.fetch.task.conversion=none;
> SELECT * FROM employee;
> {code}
> {noformat}
> 1 1879-12-31 23:52:58
> 2 1884-01-01 00:00:00
> 3 1990-01-01 00:00:00
> {noformat}
> The problem is the same reported under HIVE-24074. The data were written 
> using the new Date/Time APIs (java.time) in version Hive 3.1.3 and here they 
> were read using the old APIs (java.sql).
> The difference with HIVE-24074 is that here the problem appears only for 
> vectorized execution while the non-vectorized reader is working fine so there 
> is some *inconsistency in the behavior* of vectorized and non vectorized 
> readers.
> Non-vectorized reader works fine cause it derives automatically that it 
> should use the new JDK APIs to read back the timestamp value. This is 
> possible in this case cause there are metadata information in the file (i.e., 
> the presence of {{{}writer.time.zone{}}}) from where it can infer that the 
> timestamps were written using the new Date/Time APIs.
> The inconsistent behavior between vectorized and non-vectorized reader is a 
> regression caused by HIVE-25104. This JIRA is an attempt to re-align the 
> behavior between vectorized and non-vectorized readers.
> Note that if the file metadata are empty both vectorized and non-vectorized 
> reader cannot determine which APIs to use for the conversion and in this case 
> it is necessary the user to set the
> {{hive.parquet.timestamp.legacy.conversion.enabled}} explicitly to get back 
> the correct results.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26196) Integrate Sonar analysis for the master branch and PRs

2022-06-01 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26196:

Description: 
The aim of the ticket is to integrate SonarCloud analysis for the master branch 
and PRs.

The ticket does not cover test coverage at the moment (it can be added in 
follow-up tickets, if there is enough interest).

>From preliminary tests, the analysis step requires 30 additional minutes for 
>the pipeline, but this step is run in parallel with the test run, so the total 
>end-to-end run-time is not affected.

The idea for this first integration is to track code quality metrics over new 
commits in the master branch and for PRs, without any quality gate rules (i.e., 
the analysis will never fail, independently of the values of the quality 
metrics).

An example of analysis is available in the ASF Sonar account for Hive: [PR 
analysis|https://sonarcloud.io/summary/new_code?id=apache_hive&pullRequest=3254]

After integrating the changes, PRs will also be decorated with a link to the 
analysis to be able to better evaluate any pain points of the contribution at 
an earlier stage, making the life of the reviewers a bit easier.

  was:
The aim of the ticket is to integrate SonarCloud analysis for the master branch.

The ticket does not cover:
 * test coverage
 * analysis on PRs and other branches

Those aspects can be added in follow-up tickets, if there is enough interest.

>From preliminary tests, the analysis step requires 30 additional minutes for 
>the pipeline.

The idea for this first integration is to track code quality metrics over new 
commits in the master branch, without any quality gate rules (i.e., the 
analysis will never fail, independently of the values of the quality metrics).

An example of analysis is available in my personal Sonar account: 
[https://sonarcloud.io/summary/new_code?id=asolimando_hive]

ASF offers SonarCloud accounts for Apache projects, and Hive already has one 
(https://sonarcloud.io/project/configuration?id=apache_hive, created via 
INFRA-22542), for completing the present ticket, somebody having admin 
permissions in that repo should generated an authentication token, which should 
replace the _SONAR_TOKEN_ secret in Jenkins.


> Integrate Sonar analysis for the master branch and PRs
> --
>
> Key: HIVE-26196
> URL: https://issues.apache.org/jira/browse/HIVE-26196
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The aim of the ticket is to integrate SonarCloud analysis for the master 
> branch and PRs.
> The ticket does not cover test coverage at the moment (it can be added in 
> follow-up tickets, if there is enough interest).
> From preliminary tests, the analysis step requires 30 additional minutes for 
> the pipeline, but this step is run in parallel with the test run, so the 
> total end-to-end run-time is not affected.
> The idea for this first integration is to track code quality metrics over new 
> commits in the master branch and for PRs, without any quality gate rules 
> (i.e., the analysis will never fail, independently of the values of the 
> quality metrics).
> An example of analysis is available in the ASF Sonar account for Hive: [PR 
> analysis|https://sonarcloud.io/summary/new_code?id=apache_hive&pullRequest=3254]
> After integrating the changes, PRs will also be decorated with a link to the 
> analysis to be able to better evaluate any pain points of the contribution at 
> an earlier stage, making the life of the reviewers a bit easier.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26196) Integrate Sonar analysis for the master branch and PRs

2022-06-01 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26196:

Summary: Integrate Sonar analysis for the master branch and PRs  (was: 
Integrate Sonar analysis for the master branch)

> Integrate Sonar analysis for the master branch and PRs
> --
>
> Key: HIVE-26196
> URL: https://issues.apache.org/jira/browse/HIVE-26196
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The aim of the ticket is to integrate SonarCloud analysis for the master 
> branch.
> The ticket does not cover:
>  * test coverage
>  * analysis on PRs and other branches
> Those aspects can be added in follow-up tickets, if there is enough interest.
> From preliminary tests, the analysis step requires 30 additional minutes for 
> the pipeline.
> The idea for this first integration is to track code quality metrics over new 
> commits in the master branch, without any quality gate rules (i.e., the 
> analysis will never fail, independently of the values of the quality metrics).
> An example of analysis is available in my personal Sonar account: 
> [https://sonarcloud.io/summary/new_code?id=asolimando_hive]
> ASF offers SonarCloud accounts for Apache projects, and Hive already has one 
> (https://sonarcloud.io/project/configuration?id=apache_hive, created via 
> INFRA-22542), for completing the present ticket, somebody having admin 
> permissions in that repo should generated an authentication token, which 
> should replace the _SONAR_TOKEN_ secret in Jenkins.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26196) Integrate Sonar analysis for the master branch

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26196?focusedWorklogId=776890&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776890
 ]

ASF GitHub Bot logged work on HIVE-26196:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 14:19
Start Date: 01/Jun/22 14:19
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3254:
URL: https://github.com/apache/hive/pull/3254#issuecomment-1143673904

   Kudos, SonarCloud Quality Gate passed!    [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=3254)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3254&resolved=false&types=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3254&resolved=false&types=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3254&resolved=false&types=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=CODE_SMELL)
 [0 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3254&metric=coverage&view=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3254&metric=duplicated_lines_density&view=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 776890)
Time Spent: 1h  (was: 50m)

> Integrate Sonar analysis for the master branch
> --
>
> Key: HIVE-26196
> URL: https://issues.apache.org/jira/browse/HIVE-26196
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The aim of the ticket is to integrate SonarCloud analysis for the master 
> branch.
> The ticket does not cover:
>  * test coverage
>  * analysis on PRs and other branches
> Those aspects can be added in follow-up tickets, if there is enough interest.
> From preliminary tests, the analysis step requires 30 additional minutes for 
> the pipeline.
> The idea for this first integration is to track code quality metrics over new 
> commits in the master branch, without any quality gate rules (i.e., the 
> analysis will never fail, i

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776879&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776879
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 13:59
Start Date: 01/Jun/22 13:59
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886842968


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -549,4 +534,43 @@ private static Schema schemaWithoutConstantsAndMeta(Schema 
readSchema, Map implements 
CloseableIterator {

Review Comment:
   Moved and replaced `4` to `FILE_READ_META_COLS.size()`





Issue Time Tracking
---

Worklog Id: (was: 776879)
Time Spent: 5h 20m  (was: 5h 10m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 15 more
> Caused by: org.apache.hadoop.h

[jira] [Work logged] (HIVE-26282) Improve iceberg CTAS error message for unsupported types

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26282?focusedWorklogId=776871&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776871
 ]

ASF GitHub Bot logged work on HIVE-26282:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 13:43
Start Date: 01/Jun/22 13:43
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3337:
URL: https://github.com/apache/hive/pull/3337#discussion_r886824007


##
iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaConverter.java:
##
@@ -83,8 +83,9 @@ Type convertType(TypeInfo typeInfo) {
 return Types.BooleanType.get();
   case BYTE:
   case SHORT:
-Preconditions.checkArgument(autoConvert, "Unsupported Hive type: 
%s, use integer instead",
-((PrimitiveTypeInfo) typeInfo).getPrimitiveCategory());
+Preconditions.checkArgument(autoConvert, "Unsupported Hive type: 
%s, use integer " +

Review Comment:
   `Unsupported Hive type: %s, use integer instead or enable automatic type 
conversion, set 'iceberg.mr.schema.auto.conversion' to true`?





Issue Time Tracking
---

Worklog Id: (was: 776871)
Time Spent: 0.5h  (was: 20m)

> Improve iceberg CTAS error message for unsupported types
> 
>
> Key: HIVE-26282
> URL: https://issues.apache.org/jira/browse/HIVE-26282
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When running a CTAS query using a hive table that has a tinyint, smallint, 
> varchar or char column it fails with an "Unsupported Hive type" error 
> message. This can be worked around if the  
> 'iceberg.mr.schema.auto.conversion' property is set to true on session level. 
> We should communicate this possibility when raising the exception. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26282) Improve iceberg CTAS error message for unsupported types

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26282?focusedWorklogId=776870&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776870
 ]

ASF GitHub Bot logged work on HIVE-26282:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 13:42
Start Date: 01/Jun/22 13:42
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3337:
URL: https://github.com/apache/hive/pull/3337#discussion_r886823347


##
iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaConverter.java:
##
@@ -83,8 +83,9 @@ Type convertType(TypeInfo typeInfo) {
 return Types.BooleanType.get();
   case BYTE:
   case SHORT:
-Preconditions.checkArgument(autoConvert, "Unsupported Hive type: 
%s, use integer instead",
-((PrimitiveTypeInfo) typeInfo).getPrimitiveCategory());
+Preconditions.checkArgument(autoConvert, "Unsupported Hive type: 
%s, use integer " +
+"instead. To enable automatic type conversion, set 
'iceberg.mr.schema.auto.conversion' to true " +
+"on session level.", ((PrimitiveTypeInfo) 
typeInfo).getPrimitiveCategory());

Review Comment:
   Why `on session level`?





Issue Time Tracking
---

Worklog Id: (was: 776870)
Time Spent: 20m  (was: 10m)

> Improve iceberg CTAS error message for unsupported types
> 
>
> Key: HIVE-26282
> URL: https://issues.apache.org/jira/browse/HIVE-26282
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When running a CTAS query using a hive table that has a tinyint, smallint, 
> varchar or char column it fails with an "Unsupported Hive type" error 
> message. This can be worked around if the  
> 'iceberg.mr.schema.auto.conversion' property is set to true on session level. 
> We should communicate this possibility when raising the exception. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776865&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776865
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 13:30
Start Date: 01/Jun/22 13:30
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886809647


##
ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java:
##
@@ -78,8 +78,8 @@ public void initialize(QueryState queryState, QueryPlan 
queryPlan, TaskQueue tas
   if (source instanceof TableScanOperator) {
 TableScanOperator ts = (TableScanOperator) source;
 // push down projections
-ColumnProjectionUtils.appendReadColumns(
-job, ts.getNeededColumnIDs(), ts.getNeededColumns(), 
ts.getNeededNestedColumnPaths());
+ColumnProjectionUtils.appendReadColumns(job, ts.getNeededColumnIDs(), 
ts.getNeededColumns(),
+ts.getNeededNestedColumnPaths(), ts.conf.hasVirtualCols());

Review Comment:
   Unfortunately it is not consistent when we are expose or not. Example.:
   
https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java#L86
   
   





Issue Time Tracking
---

Worklog Id: (was: 776865)
Time Spent: 5h 10m  (was: 5h)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.Runtime

[jira] [Work logged] (HIVE-26282) Improve iceberg CTAS error message for unsupported types

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26282?focusedWorklogId=776863&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776863
 ]

ASF GitHub Bot logged work on HIVE-26282:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 13:27
Start Date: 01/Jun/22 13:27
Worklog Time Spent: 10m 
  Work Description: lcspinter opened a new pull request, #3337:
URL: https://github.com/apache/hive/pull/3337

   
   
   ### What changes were proposed in this pull request?
   Improve error message and add some unit tests
   
   
   
   ### Why are the changes needed?
   When running a CTAS query using a hive table that has a tinyint, smallint, 
varchar or char column it fails with an "Unsupported Hive type" error message. 
This can be worked around if the 'iceberg.mr.schema.auto.conversion' property 
is set to true on session level. We should communicate this possibility when 
raising the exception.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   Unit test
   
   




Issue Time Tracking
---

Worklog Id: (was: 776863)
Remaining Estimate: 0h
Time Spent: 10m

> Improve iceberg CTAS error message for unsupported types
> 
>
> Key: HIVE-26282
> URL: https://issues.apache.org/jira/browse/HIVE-26282
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When running a CTAS query using a hive table that has a tinyint, smallint, 
> varchar or char column it fails with an "Unsupported Hive type" error 
> message. This can be worked around if the  
> 'iceberg.mr.schema.auto.conversion' property is set to true on session level. 
> We should communicate this possibility when raising the exception. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.3.3

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24484?focusedWorklogId=776864&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776864
 ]

ASF GitHub Bot logged work on HIVE-24484:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 13:27
Start Date: 01/Jun/22 13:27
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on PR #3279:
URL: https://github.com/apache/hive/pull/3279#issuecomment-1143612371

   > I would guess the directory listing order might have changed...
   
   shouldn't have AFAIK




Issue Time Tracking
---

Worklog Id: (was: 776864)
Time Spent: 12.05h  (was: 11h 53m)

> Upgrade Hadoop to 3.3.3
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12.05h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26282) Improve iceberg CTAS error message for unsupported types

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26282:
--
Labels: pull-request-available  (was: )

> Improve iceberg CTAS error message for unsupported types
> 
>
> Key: HIVE-26282
> URL: https://issues.apache.org/jira/browse/HIVE-26282
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When running a CTAS query using a hive table that has a tinyint, smallint, 
> varchar or char column it fails with an "Unsupported Hive type" error 
> message. This can be worked around if the  
> 'iceberg.mr.schema.auto.conversion' property is set to true on session level. 
> We should communicate this possibility when raising the exception. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Assigned] (HIVE-26280) Copy more data into COMPLETED_COMPACTIONS for better supportability

2022-06-01 Thread Karen Coppage (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage reassigned HIVE-26280:



> Copy more data into COMPLETED_COMPACTIONS for better supportability
> ---
>
> Key: HIVE-26280
> URL: https://issues.apache.org/jira/browse/HIVE-26280
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Minor
>
> There is some information in COMPACTION_QUEUE that doesn't get copied over to 
> COMPLETED_COMPACTIONS when compaction completes. It would help with 
> supportability if COMPLETED_COMPACTIONS (and especially the view of it in the 
> SYS database) also contained this information.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Assigned] (HIVE-26282) Improve iceberg CTAS error message for unsupported types

2022-06-01 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-26282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér reassigned HIVE-26282:



> Improve iceberg CTAS error message for unsupported types
> 
>
> Key: HIVE-26282
> URL: https://issues.apache.org/jira/browse/HIVE-26282
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>
> When running a CTAS query using a hive table that has a tinyint, smallint, 
> varchar or char column it fails with an "Unsupported Hive type" error 
> message. This can be worked around if the  
> 'iceberg.mr.schema.auto.conversion' property is set to true on session level. 
> We should communicate this possibility when raising the exception. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Assigned] (HIVE-26281) Missing statistics when requesting partition by names via HS2

2022-06-01 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-26281:
--


> Missing statistics when requesting partition by names via HS2
> -
>
> Key: HIVE-26281
> URL: https://issues.apache.org/jira/browse/HIVE-26281
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> [Hive#getPartitionsByNames|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4155]
>  method can be used to obtain partition objects from the metastore by 
> specifying their names and other options.
> {code:java}
> public List getPartitionsByNames(Table tbl, List 
> partNames, boolean getColStats){code}
> However, the partition statistics are missing from the returned objects no 
> matter the value of the {{getColStats}} parameter.
> The problem is 
> [here|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4174]
>  and was caused by HIVE-24743.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776858&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776858
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 13:12
Start Date: 01/Jun/22 13:12
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886789104


##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -15005,6 +15004,12 @@ private AcidUtils.Operation getAcidType(String 
destination) {
 AcidUtils.Operation.INSERT);
   }
 
+  private Context.Operation getWriteOperation(String destination) {

Review Comment:
   Thanks for the explanation





Issue Time Tracking
---

Worklog Id: (was: 776858)
Time Spent: 5h  (was: 4h 50m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 15 more
> Caused by: org.apache

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776857&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776857
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 13:10
Start Date: 01/Jun/22 13:10
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886787413


##
ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:
##
@@ -739,6 +742,11 @@ protected void initializeOp(Configuration hconf) throws 
HiveException {
 }
   }
 
+  private void setWriteOperation(Configuration conf) {

Review Comment:
   Thx





Issue Time Tracking
---

Worklog Id: (was: 776857)
Time Spent: 4h 50m  (was: 4h 40m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 15 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Er

[jira] [Updated] (HIVE-24484) Upgrade Hadoop to 3.3.3

2022-06-01 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-24484:

Summary: Upgrade Hadoop to 3.3.3  (was: Upgrade Hadoop to 3.3.1)

> Upgrade Hadoop to 3.3.3
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 53m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776856&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776856
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 13:06
Start Date: 01/Jun/22 13:06
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886783428


##
ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java:
##
@@ -78,8 +78,8 @@ public void initialize(QueryState queryState, QueryPlan 
queryPlan, TaskQueue tas
   if (source instanceof TableScanOperator) {
 TableScanOperator ts = (TableScanOperator) source;
 // push down projections
-ColumnProjectionUtils.appendReadColumns(
-job, ts.getNeededColumnIDs(), ts.getNeededColumns(), 
ts.getNeededNestedColumnPaths());
+ColumnProjectionUtils.appendReadColumns(job, ts.getNeededColumnIDs(), 
ts.getNeededColumns(),
+ts.getNeededNestedColumnPaths(), ts.conf.hasVirtualCols());

Review Comment:
   Do we usually expose the config object of the operators for tasks?





Issue Time Tracking
---

Worklog Id: (was: 776856)
Time Spent: 4h 40m  (was: 4.5h)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.Ma

[jira] [Updated] (HIVE-26279) Drop unused requests from TestHiveMetaStoreClientApiArgumentsChecker

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26279:
--
Labels: pull-request-available  (was: )

> Drop unused requests from TestHiveMetaStoreClientApiArgumentsChecker
> 
>
> Key: HIVE-26279
> URL: https://issues.apache.org/jira/browse/HIVE-26279
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Some tests in TestHiveMetaStoreClientApiArgumentsChecker are creating a 
> request but not really using them so it is basically dead code that can be 
> removed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26279) Drop unused requests from TestHiveMetaStoreClientApiArgumentsChecker

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26279?focusedWorklogId=776851&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776851
 ]

ASF GitHub Bot logged work on HIVE-26279:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 12:56
Start Date: 01/Jun/22 12:56
Worklog Time Spent: 10m 
  Work Description: zabetak opened a new pull request, #3336:
URL: https://github.com/apache/hive/pull/3336

   ### What changes were proposed in this pull request?
   Remove useless code
   
   ### Why are the changes needed?
   Readability
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   `mvn test -Dtest=TestHiveMetaStoreClientApiArgumentsChecker`
   




Issue Time Tracking
---

Worklog Id: (was: 776851)
Remaining Estimate: 0h
Time Spent: 10m

> Drop unused requests from TestHiveMetaStoreClientApiArgumentsChecker
> 
>
> Key: HIVE-26279
> URL: https://issues.apache.org/jira/browse/HIVE-26279
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Some tests in TestHiveMetaStoreClientApiArgumentsChecker are creating a 
> request but not really using them so it is basically dead code that can be 
> removed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26278) Add unit tests for Hive#getPartitionsByNames using batching

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26278?focusedWorklogId=776849&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776849
 ]

ASF GitHub Bot logged work on HIVE-26278:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 12:49
Start Date: 01/Jun/22 12:49
Worklog Time Spent: 10m 
  Work Description: zabetak opened a new pull request, #3335:
URL: https://github.com/apache/hive/pull/3335

   ### What changes were proposed in this pull request?
   New tests cases for more code coverage.
   
   ### Why are the changes needed?
   Ensure that ValidWriteIdList is set when batching is involved in 
`Hive#getPartitionByNames`.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   `mvn test -Dtest=TestHiveMetaStoreClientApiArgumentsChecker`




Issue Time Tracking
---

Worklog Id: (was: 776849)
Remaining Estimate: 0h
Time Spent: 10m

> Add unit tests for Hive#getPartitionsByNames using batching
> ---
>
> Key: HIVE-26278
> URL: https://issues.apache.org/jira/browse/HIVE-26278
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [Hive#getPartitionsByNames|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4155]
>  supports decomposing requests in batches but there are no unit tests 
> checking for the ValidWriteIdList when batching is used.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26278) Add unit tests for Hive#getPartitionsByNames using batching

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26278:
--
Labels: pull-request-available  (was: )

> Add unit tests for Hive#getPartitionsByNames using batching
> ---
>
> Key: HIVE-26278
> URL: https://issues.apache.org/jira/browse/HIVE-26278
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [Hive#getPartitionsByNames|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4155]
>  supports decomposing requests in batches but there are no unit tests 
> checking for the ValidWriteIdList when batching is used.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-25421) Fallback from vectorization when reading Iceberg's time columns

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25421?focusedWorklogId=776847&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776847
 ]

ASF GitHub Bot logged work on HIVE-25421:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 12:44
Start Date: 01/Jun/22 12:44
Worklog Time Spent: 10m 
  Work Description: szlta opened a new pull request, #3334:
URL: https://github.com/apache/hive/pull/3334

   As discussed in 
[HIVE-25420](https://issues.apache.org/jira/browse/HIVE-25420) time column is 
not native Hive type, reading it is more complicated, and is not supported for 
vectorized read when the file format is ORC. Trying this currently results in 
an exception, so we should make an effort to gracefully fall back to 
non-vectorized reads when there's such a column in the query's projection.




Issue Time Tracking
---

Worklog Id: (was: 776847)
Remaining Estimate: 0h
Time Spent: 10m

> Fallback from vectorization when reading Iceberg's time columns
> ---
>
> Key: HIVE-25421
> URL: https://issues.apache.org/jira/browse/HIVE-25421
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As discussed in HIVE-25420 time column is not native Hive type, reading it is 
> more complicated, and is not supported for vectorized read. Trying this 
> currently results in an exception, so we should make an effort to
>  * either gracefully fall back to non-vectorized reads when there's such a 
> column in the query's projection
>  * or work around the reading issue on the execution side.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-25421) Fallback from vectorization when reading Iceberg's time columns

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25421:
--
Labels: pull-request-available  (was: )

> Fallback from vectorization when reading Iceberg's time columns
> ---
>
> Key: HIVE-25421
> URL: https://issues.apache.org/jira/browse/HIVE-25421
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As discussed in HIVE-25420 time column is not native Hive type, reading it is 
> more complicated, and is not supported for vectorized read. Trying this 
> currently results in an exception, so we should make an effort to
>  * either gracefully fall back to non-vectorized reads when there's such a 
> column in the query's projection
>  * or work around the reading issue on the execution side.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-25421) Fallback from vectorization when reading Iceberg's time columns

2022-06-01 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-25421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita updated HIVE-25421:
--
Summary: Fallback from vectorization when reading Iceberg's time columns  
(was: Add support for reading Iceberg's time columns with vectorization turned 
on)

> Fallback from vectorization when reading Iceberg's time columns
> ---
>
> Key: HIVE-25421
> URL: https://issues.apache.org/jira/browse/HIVE-25421
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>
> As discussed in HIVE-25420 time column is not native Hive type, reading it is 
> more complicated, and is not supported for vectorized read. Trying this 
> currently results in an exception, so we should make an effort to
>  * either gracefully fall back to non-vectorized reads when there's such a 
> column in the query's projection
>  * or work around the reading issue on the execution side.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.3.1

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24484?focusedWorklogId=776830&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776830
 ]

ASF GitHub Bot logged work on HIVE-24484:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 12:02
Start Date: 01/Jun/22 12:02
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on PR #3279:
URL: https://github.com/apache/hive/pull/3279#issuecomment-1143517119

   jetty upgrade came in https://issues.apache.org/jira/browse/HADOOP-17796 & 
https://github.com/apache/hadoop/pull/3208  some security advisories there so 
it is probably better to deal with the change than try and stick to the older 
version. sorry




Issue Time Tracking
---

Worklog Id: (was: 776830)
Time Spent: 11h 53m  (was: 11h 43m)

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 53m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776826&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776826
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 11:57
Start Date: 01/Jun/22 11:57
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886717900


##
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/HiveCustomStorageHandlerUtils.java:
##
@@ -48,4 +54,13 @@ public static Map getTableProperties(Table 
table) {
 .ifPresent(tblProps::putAll);
 return tblProps;
 }
+
+public static Context.Operation operation(Configuration conf, String 
tableName) {

Review Comment:
   Yes, this is what I did in my last commit. Only the method name is different 
:)





Issue Time Tracking
---

Worklog Id: (was: 776826)
Time Spent: 4.5h  (was: 4h 20m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoo

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776824&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776824
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 11:56
Start Date: 01/Jun/22 11:56
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886717199


##
ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:
##
@@ -616,6 +617,8 @@ protected void initializeOp(Configuration hconf) throws 
HiveException {
   initializeSpecPath();
   fs = specPath.getFileSystem(hconf);
 
+  hconf.set(WRITE_OPERATION_CONFIG_PREFIX + 
getConf().getTableInfo().getTableName(),

Review Comment:
   Moved both get/set write operation to HiveCustomStorageHandlerUtils.





Issue Time Tracking
---

Worklog Id: (was: 776824)
Time Spent: 4h 20m  (was: 4h 10m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776822&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776822
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 11:55
Start Date: 01/Jun/22 11:55
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886716379


##
ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java:
##
@@ -78,8 +78,8 @@ public void initialize(QueryState queryState, QueryPlan 
queryPlan, TaskQueue tas
   if (source instanceof TableScanOperator) {
 TableScanOperator ts = (TableScanOperator) source;
 // push down projections
-ColumnProjectionUtils.appendReadColumns(
-job, ts.getNeededColumnIDs(), ts.getNeededColumns(), 
ts.getNeededNestedColumnPaths());
+ColumnProjectionUtils.appendReadColumns(job, ts.getNeededColumnIDs(), 
ts.getNeededColumns(),
+ts.getNeededNestedColumnPaths(), ts.conf.hasVirtualCols());

Review Comment:
   I don't see the benefit of exposing this on TSOperator. Maybe the call here 
would be shorter.





Issue Time Tracking
---

Worklog Id: (was: 776822)
Time Spent: 4h 10m  (was: 4h)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apa

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776820&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776820
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 11:53
Start Date: 01/Jun/22 11:53
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886714583


##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -15005,6 +15004,12 @@ private AcidUtils.Operation getAcidType(String 
destination) {
 AcidUtils.Operation.INSERT);
   }
 
+  private Context.Operation getWriteOperation(String destination) {

Review Comment:
   No, these `destinations` are coming from the QueryParserInfo objects 
getQB().getParseInfo().getClauseNames().iterator().next();
   and set in the UpdateDeleteSA like
   ```
 rewrittenCtx.setOperation(Context.Operation.DELETE);
 rewrittenCtx.addDestNamePrefix(1, Context.DestClausePrefix.DELETE);
   ```





Issue Time Tracking
---

Worklog Id: (was: 776820)
Time Spent: 4h  (was: 3h 50m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordS

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776818&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776818
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 11:49
Start Date: 01/Jun/22 11:49
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886711886


##
ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:
##
@@ -739,6 +742,11 @@ protected void initializeOp(Configuration hconf) throws 
HiveException {
 }
   }
 
+  private void setWriteOperation(Configuration conf) {

Review Comment:
   Moved both get/set to `HiveCustomStorageHandlerUtils`.





Issue Time Tracking
---

Worklog Id: (was: 776818)
Time Spent: 3h 50m  (was: 3h 40m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 15 more
> Caused by: org.apache.ha

[jira] [Work started] (HIVE-26279) Drop unused requests from TestHiveMetaStoreClientApiArgumentsChecker

2022-06-01 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26279 started by Stamatis Zampetakis.
--
> Drop unused requests from TestHiveMetaStoreClientApiArgumentsChecker
> 
>
> Key: HIVE-26279
> URL: https://issues.apache.org/jira/browse/HIVE-26279
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Trivial
>
> Some tests in TestHiveMetaStoreClientApiArgumentsChecker are creating a 
> request but not really using them so it is basically dead code that can be 
> removed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Assigned] (HIVE-26279) Drop unused requests from TestHiveMetaStoreClientApiArgumentsChecker

2022-06-01 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-26279:
--


> Drop unused requests from TestHiveMetaStoreClientApiArgumentsChecker
> 
>
> Key: HIVE-26279
> URL: https://issues.apache.org/jira/browse/HIVE-26279
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Trivial
>
> Some tests in TestHiveMetaStoreClientApiArgumentsChecker are creating a 
> request but not really using them so it is basically dead code that can be 
> removed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=776814&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776814
 ]

ASF GitHub Bot logged work on HIVE-26244:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 11:42
Start Date: 01/Jun/22 11:42
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3307:
URL: https://github.com/apache/hive/pull/3307#discussion_r886705645


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableOperation.java:
##
@@ -99,7 +99,8 @@ public int execute() throws HiveException {
   createTableNonReplaceMode(tbl);
 }
 
-DDLUtils.addIfAbsentByName(new WriteEntity(tbl, 
WriteEntity.WriteType.DDL_NO_LOCK), context);
+  DDLUtils.addIfAbsentByName(new WriteEntity(tbl, 
WriteEntity.WriteType.DDL_NO_LOCK), context);

Review Comment:
   what changed here, extra space?





Issue Time Tracking
---

Worklog Id: (was: 776814)
Time Spent: 1h 50m  (was: 1h 40m)

> Implementing locking for concurrent ctas
> 
>
> Key: HIVE-26244
> URL: https://issues.apache.org/jira/browse/HIVE-26244
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri G
>Assignee: Simhadri G
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26278) Add unit tests for Hive#getPartitionsByNames using batching

2022-06-01 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-26278:
---
Parent: HIVE-21637
Issue Type: Sub-task  (was: Task)

> Add unit tests for Hive#getPartitionsByNames using batching
> ---
>
> Key: HIVE-26278
> URL: https://issues.apache.org/jira/browse/HIVE-26278
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> [Hive#getPartitionsByNames|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4155]
>  supports decomposing requests in batches but there are no unit tests 
> checking for the ValidWriteIdList when batching is used.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (HIVE-25936) ValidWriteIdList & table id are sometimes missing when requesting partitions by name via HS2

2022-06-01 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544826#comment-17544826
 ] 

Stamatis Zampetakis commented on HIVE-25936:


New unit tests to be added as part of HIVE-26278

> ValidWriteIdList & table id are sometimes missing when requesting partitions 
> by name via HS2
> 
>
> Key: HIVE-25936
> URL: https://issues.apache.org/jira/browse/HIVE-25936
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> According to HIVE-24743 the table id and {{ValidWriteIdList}} are important 
> for keeping HMS remote metadata cache consistent. Although HIVE-24743 
> attempted to pass the write id list and table id in every call to HMS it 
> failed to do so completely. For those partitions not handled in the batch 
> logic, the [metastore 
> call|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4161]
>  in {{Hive#getPartitionsByName}} method does not pass the table id and write 
> id list.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work started] (HIVE-26278) Add unit tests for Hive#getPartitionsByNames using batching

2022-06-01 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26278 started by Stamatis Zampetakis.
--
> Add unit tests for Hive#getPartitionsByNames using batching
> ---
>
> Key: HIVE-26278
> URL: https://issues.apache.org/jira/browse/HIVE-26278
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> [Hive#getPartitionsByNames|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4155]
>  supports decomposing requests in batches but there are no unit tests 
> checking for the ValidWriteIdList when batching is used.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Assigned] (HIVE-26278) Add unit tests for Hive#getPartitionsByNames using batching

2022-06-01 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-26278:
--


> Add unit tests for Hive#getPartitionsByNames using batching
> ---
>
> Key: HIVE-26278
> URL: https://issues.apache.org/jira/browse/HIVE-26278
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> [Hive#getPartitionsByNames|https://github.com/apache/hive/blob/6626b5564ee206db5a656d2f611ed71f10a0ffc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4155]
>  supports decomposing requests in batches but there are no unit tests 
> checking for the ValidWriteIdList when batching is used.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-25936) ValidWriteIdList & table id are sometimes missing when requesting partitions by name via HS2

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25936?focusedWorklogId=776800&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776800
 ]

ASF GitHub Bot logged work on HIVE-25936:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 11:18
Start Date: 01/Jun/22 11:18
Worklog Time Spent: 10m 
  Work Description: zabetak closed pull request #3007: HIVE-25936: 
ValidWriteIdList & table id are sometimes missing when requesting partitions by 
name via HS2
URL: https://github.com/apache/hive/pull/3007




Issue Time Tracking
---

Worklog Id: (was: 776800)
Time Spent: 40m  (was: 0.5h)

> ValidWriteIdList & table id are sometimes missing when requesting partitions 
> by name via HS2
> 
>
> Key: HIVE-25936
> URL: https://issues.apache.org/jira/browse/HIVE-25936
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> According to HIVE-24743 the table id and {{ValidWriteIdList}} are important 
> for keeping HMS remote metadata cache consistent. Although HIVE-24743 
> attempted to pass the write id list and table id in every call to HMS it 
> failed to do so completely. For those partitions not handled in the batch 
> logic, the [metastore 
> call|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4161]
>  in {{Hive#getPartitionsByName}} method does not pass the table id and write 
> id list.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-25936) ValidWriteIdList & table id are sometimes missing when requesting partitions by name via HS2

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25936?focusedWorklogId=776799&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776799
 ]

ASF GitHub Bot logged work on HIVE-25936:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 11:18
Start Date: 01/Jun/22 11:18
Worklog Time Spent: 10m 
  Work Description: zabetak commented on PR #3007:
URL: https://github.com/apache/hive/pull/3007#issuecomment-1143474215

   I am closing this PR down since the bug was fixed as part of HIVE-25935. I 
will open new PRs for the additional refactoring and the tests.




Issue Time Tracking
---

Worklog Id: (was: 776799)
Time Spent: 0.5h  (was: 20m)

> ValidWriteIdList & table id are sometimes missing when requesting partitions 
> by name via HS2
> 
>
> Key: HIVE-25936
> URL: https://issues.apache.org/jira/browse/HIVE-25936
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> According to HIVE-24743 the table id and {{ValidWriteIdList}} are important 
> for keeping HMS remote metadata cache consistent. Although HIVE-24743 
> attempted to pass the write id list and table id in every call to HMS it 
> failed to do so completely. For those partitions not handled in the batch 
> logic, the [metastore 
> call|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4161]
>  in {{Hive#getPartitionsByName}} method does not pass the table id and write 
> id list.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26095) Add queryid in QueryLifeTimeHookContext

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26095?focusedWorklogId=776793&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776793
 ]

ASF GitHub Bot logged work on HIVE-26095:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 11:00
Start Date: 01/Jun/22 11:00
Worklog Time Spent: 10m 
  Work Description: zabetak closed pull request #3156: HIVE-26095: Add 
queryid in QueryLifeTimeHookContext
URL: https://github.com/apache/hive/pull/3156




Issue Time Tracking
---

Worklog Id: (was: 776793)
Time Spent: 2h 20m  (was: 2h 10m)

> Add queryid in QueryLifeTimeHookContext
> ---
>
> Key: HIVE-26095
> URL: https://issues.apache.org/jira/browse/HIVE-26095
> Project: Hive
>  Issue Type: New Feature
>  Components: Hooks
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> A 
> [QueryLifeTimeHook|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHook.java]
>  is executed various times in the life-cycle of a query but it is not always 
> possible to obtain the id of the query. The query id is inside the 
> {{HookContext}} but the latter is not always available notably during 
> compilation.
> The query id is useful for many purposes as it is the only way to uniquely 
> identify the query/command that is currently running. It is also the only way 
> to match together events appearing in before and after methods.
> The goal of this jira is to add the query id in 
> [QueryLifeTimeHookContext|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHookContext.java]
>  and make it available during all life-cycle events.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (HIVE-25936) ValidWriteIdList & table id are sometimes missing when requesting partitions by name via HS2

2022-06-01 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-25936.

Fix Version/s: 4.0.0-alpha-1
   Resolution: Fixed

The bug reported here was fixed by the cleanup done in HIVE-25935 so I am 
marking this JIRA as fixed.

The PR contains some useful refactoring and tests but I will track them down 
under new PRs/JIRAs.

> ValidWriteIdList & table id are sometimes missing when requesting partitions 
> by name via HS2
> 
>
> Key: HIVE-25936
> URL: https://issues.apache.org/jira/browse/HIVE-25936
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> According to HIVE-24743 the table id and {{ValidWriteIdList}} are important 
> for keeping HMS remote metadata cache consistent. Although HIVE-24743 
> attempted to pass the write id list and table id in every call to HMS it 
> failed to do so completely. For those partitions not handled in the batch 
> logic, the [metastore 
> call|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4161]
>  in {{Hive#getPartitionsByName}} method does not pass the table id and write 
> id list.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (HIVE-26095) Add queryid in QueryLifeTimeHookContext

2022-06-01 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-26095.

Resolution: Won't Fix

As discussed under the 
[PR|https://github.com/apache/hive/pull/3156#discussion_r840394149] it is 
possible to obtain the query id via the Hive configuration (using 
{{hive.query.id}} property) so there is no need to introduce a new API for this 
purpose thus I am closing this JIRA as won't fix.

> Add queryid in QueryLifeTimeHookContext
> ---
>
> Key: HIVE-26095
> URL: https://issues.apache.org/jira/browse/HIVE-26095
> Project: Hive
>  Issue Type: New Feature
>  Components: Hooks
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> A 
> [QueryLifeTimeHook|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHook.java]
>  is executed various times in the life-cycle of a query but it is not always 
> possible to obtain the id of the query. The query id is inside the 
> {{HookContext}} but the latter is not always available notably during 
> compilation.
> The query id is useful for many purposes as it is the only way to uniquely 
> identify the query/command that is currently running. It is also the only way 
> to match together events appearing in before and after methods.
> The goal of this jira is to add the query id in 
> [QueryLifeTimeHookContext|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHookContext.java]
>  and make it available during all life-cycle events.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Closed] (HIVE-26095) Add queryid in QueryLifeTimeHookContext

2022-06-01 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis closed HIVE-26095.
--

> Add queryid in QueryLifeTimeHookContext
> ---
>
> Key: HIVE-26095
> URL: https://issues.apache.org/jira/browse/HIVE-26095
> Project: Hive
>  Issue Type: New Feature
>  Components: Hooks
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> A 
> [QueryLifeTimeHook|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHook.java]
>  is executed various times in the life-cycle of a query but it is not always 
> possible to obtain the id of the query. The query id is inside the 
> {{HookContext}} but the latter is not always available notably during 
> compilation.
> The query id is useful for many purposes as it is the only way to uniquely 
> identify the query/command that is currently running. It is also the only way 
> to match together events appearing in before and after methods.
> The goal of this jira is to add the query id in 
> [QueryLifeTimeHookContext|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHookContext.java]
>  and make it available during all life-cycle events.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26095) Add queryid in QueryLifeTimeHookContext

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26095?focusedWorklogId=776792&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776792
 ]

ASF GitHub Bot logged work on HIVE-26095:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 11:00
Start Date: 01/Jun/22 11:00
Worklog Time Spent: 10m 
  Work Description: zabetak commented on PR #3156:
URL: https://github.com/apache/hive/pull/3156#issuecomment-1143454631

   As discussed 
[previously](https://github.com/apache/hive/pull/3156#discussion_r840394149), 
there is no need to introduce a new API since it is possible to achieve the 
same result via the Hive configuration, so I am closing this PR.




Issue Time Tracking
---

Worklog Id: (was: 776792)
Time Spent: 2h 10m  (was: 2h)

> Add queryid in QueryLifeTimeHookContext
> ---
>
> Key: HIVE-26095
> URL: https://issues.apache.org/jira/browse/HIVE-26095
> Project: Hive
>  Issue Type: New Feature
>  Components: Hooks
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> A 
> [QueryLifeTimeHook|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHook.java]
>  is executed various times in the life-cycle of a query but it is not always 
> possible to obtain the id of the query. The query id is inside the 
> {{HookContext}} but the latter is not always available notably during 
> compilation.
> The query id is useful for many purposes as it is the only way to uniquely 
> identify the query/command that is currently running. It is also the only way 
> to match together events appearing in before and after methods.
> The goal of this jira is to add the query id in 
> [QueryLifeTimeHookContext|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHookContext.java]
>  and make it available during all life-cycle events.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26095) Add queryid in QueryLifeTimeHookContext

2022-06-01 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-26095:
---
Fix Version/s: (was: 4.0.0-alpha-2)

> Add queryid in QueryLifeTimeHookContext
> ---
>
> Key: HIVE-26095
> URL: https://issues.apache.org/jira/browse/HIVE-26095
> Project: Hive
>  Issue Type: New Feature
>  Components: Hooks
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> A 
> [QueryLifeTimeHook|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHook.java]
>  is executed various times in the life-cycle of a query but it is not always 
> possible to obtain the id of the query. The query id is inside the 
> {{HookContext}} but the latter is not always available notably during 
> compilation.
> The query id is useful for many purposes as it is the only way to uniquely 
> identify the query/command that is currently running. It is also the only way 
> to match together events appearing in before and after methods.
> The goal of this jira is to add the query id in 
> [QueryLifeTimeHookContext|https://github.com/apache/hive/blob/6c0b86ef0cfc67c5acb3468408e1d46fa6ef8024/ql/src/java/org/apache/hadoop/hive/ql/hooks/QueryLifeTimeHookContext.java]
>  and make it available during all life-cycle events.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26196) Integrate Sonar analysis for the master branch

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26196?focusedWorklogId=776772&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776772
 ]

ASF GitHub Bot logged work on HIVE-26196:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 10:15
Start Date: 01/Jun/22 10:15
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3254:
URL: https://github.com/apache/hive/pull/3254#issuecomment-1143408741

   Kudos, SonarCloud Quality Gate passed!    [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=3254)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3254&resolved=false&types=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3254&resolved=false&types=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3254&resolved=false&types=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=CODE_SMELL)
 [1 Code 
Smell](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3254&resolved=false&types=CODE_SMELL)
   
   
[![0.0%](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/0-16px.png
 
'0.0%')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3254&metric=new_coverage&view=list)
 [0.0% 
Coverage](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3254&metric=new_coverage&view=list)
  
   
[![0.0%](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/3-16px.png
 
'0.0%')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3254&metric=new_duplicated_lines_density&view=list)
 [0.0% 
Duplication](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3254&metric=new_duplicated_lines_density&view=list)
   
   




Issue Time Tracking
---

Worklog Id: (was: 776772)
Time Spent: 50m  (was: 40m)

> Integrate Sonar analysis for the master branch
> --
>
> Key: HIVE-26196
> URL: https://issues.apache.org/jira/browse/HIVE-26196
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The aim of the ticket is to integrate SonarCloud analysis for the master 
> branch.
> The ticket does not cover:
>  * test coverage
>  * analysis on PRs and other branches
> Those aspects can be added in follow-up tickets, if there is enough interest.
> From preliminary tests, the analysis step requires 30 additional minutes for 
> the pipeline.
> The idea for this first integration is to track code quality metrics over

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776771&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776771
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 10:13
Start Date: 01/Jun/22 10:13
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886633690


##
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/HiveCustomStorageHandlerUtils.java:
##
@@ -48,4 +54,13 @@ public static Map getTableProperties(Table 
table) {
 .ifPresent(tblProps::putAll);
 return tblProps;
 }
+
+public static Context.Operation operation(Configuration conf, String 
tableName) {

Review Comment:
   So maybe another method like:
   `HiveCustomStorageHandlerUtils.setOperstion(hconf, tableName, operation)`?





Issue Time Tracking
---

Worklog Id: (was: 776771)
Time Spent: 3h 40m  (was: 3.5h)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776770&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776770
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 10:12
Start Date: 01/Jun/22 10:12
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886632668


##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -15005,6 +15004,12 @@ private AcidUtils.Operation getAcidType(String 
destination) {
 AcidUtils.Operation.INSERT);
   }
 
+  private Context.Operation getWriteOperation(String destination) {

Review Comment:
   Is this reading the operation set by 
`HiveCustomStorageHandlerUtils.setWrite(hconf, tableName)`?





Issue Time Tracking
---

Worklog Id: (was: 776770)
Time Spent: 3.5h  (was: 3h 20m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunP

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776768&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776768
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 10:09
Start Date: 01/Jun/22 10:09
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886629784


##
ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:
##
@@ -616,6 +617,8 @@ protected void initializeOp(Configuration hconf) throws 
HiveException {
   initializeSpecPath();
   fs = specPath.getFileSystem(hconf);
 
+  hconf.set(WRITE_OPERATION_CONFIG_PREFIX + 
getConf().getTableInfo().getTableName(),

Review Comment:
   Could we just do this like:
   `HiveCustomStorageHandlerUtils.setWrite(hconf, tableName)`?





Issue Time Tracking
---

Worklog Id: (was: 776768)
Time Spent: 3h 20m  (was: 3h 10m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776738
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 09:29
Start Date: 01/Jun/22 09:29
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886591933


##
ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java:
##
@@ -78,8 +78,8 @@ public void initialize(QueryState queryState, QueryPlan 
queryPlan, TaskQueue tas
   if (source instanceof TableScanOperator) {
 TableScanOperator ts = (TableScanOperator) source;
 // push down projections
-ColumnProjectionUtils.appendReadColumns(
-job, ts.getNeededColumnIDs(), ts.getNeededColumns(), 
ts.getNeededNestedColumnPaths());
+ColumnProjectionUtils.appendReadColumns(job, ts.getNeededColumnIDs(), 
ts.getNeededColumns(),
+ts.getNeededNestedColumnPaths(), ts.conf.hasVirtualCols());

Review Comment:
   nit: Shall we expose `hasVirtualCols` on TSOperator instead of exposing and 
using the `conf`?





Issue Time Tracking
---

Worklog Id: (was: 776738)
Time Spent: 3h 10m  (was: 3h)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776736&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776736
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 09:28
Start Date: 01/Jun/22 09:28
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886590600


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -549,4 +534,43 @@ private static Schema schemaWithoutConstantsAndMeta(Schema 
readSchema, Map implements 
CloseableIterator {

Review Comment:
   nit: maybe move this class to the IcebergAcidUtil, so we do not have to use 
magic numbers, like `4`?





Issue Time Tracking
---

Worklog Id: (was: 776736)
Time Spent: 3h  (was: 2h 50m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 15 mo

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776734&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776734
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 09:27
Start Date: 01/Jun/22 09:27
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886590600


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -549,4 +534,43 @@ private static Schema schemaWithoutConstantsAndMeta(Schema 
readSchema, Map implements 
CloseableIterator {

Review Comment:
   nit: maybe move this to the IcebergAcidUtil, so we do not have to use magic 
numbers, like `4`?





Issue Time Tracking
---

Worklog Id: (was: 776734)
Time Spent: 2h 50m  (was: 2h 40m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 15

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776732&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776732
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 09:21
Start Date: 01/Jun/22 09:21
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886584030


##
ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:
##
@@ -739,6 +742,11 @@ protected void initializeOp(Configuration hconf) throws 
HiveException {
 }
   }
 
+  private void setWriteOperation(Configuration conf) {

Review Comment:
   Would it make sense to keep the read/set part in the same class?





Issue Time Tracking
---

Worklog Id: (was: 776732)
Time Spent: 2h 40m  (was: 2.5h)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 15 more
> Caused by: org.apach

[jira] [Resolved] (HIVE-25907) IOW Directory queries fails to write data to final path when query result cache is enabled

2022-06-01 Thread Peter Vary (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-25907.
---
Resolution: Fixed

Pushed to master.
Thanks for the fix [~srahman]!

> IOW Directory queries fails to write data to final path when query result 
> cache is enabled
> --
>
> Key: HIVE-25907
> URL: https://issues.apache.org/jira/browse/HIVE-25907
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> INSERT OVERWRITE DIRECTORY queries fails to write the data to the specified 
> directory location when query result cache is enabled.
> *Steps to reproduce*
> {code:java}
> 1. create a data file with the following data
> 1 abc 10.5
> 2 def 11.5
> 2. create table pointing to that data
> create external table iowd(strct struct)
> row format delimited
> fields terminated by '\t'
> collection items terminated by ' '
> location '';
> 3. run the following query
> set hive.query.results.cache.enabled=true;
> INSERT OVERWRITE DIRECTORY "" SELECT * FROM iowd;
> {code}
> After execution of the above query, It is expected that the destination 
> directory contains data from the table iowd, But due to HIVE-21386 it is not 
> happening anymore.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-25907) IOW Directory queries fails to write data to final path when query result cache is enabled

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25907?focusedWorklogId=776729&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776729
 ]

ASF GitHub Bot logged work on HIVE-25907:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 09:18
Start Date: 01/Jun/22 09:18
Worklog Time Spent: 10m 
  Work Description: pvary merged PR #2978:
URL: https://github.com/apache/hive/pull/2978




Issue Time Tracking
---

Worklog Id: (was: 776729)
Time Spent: 4h 20m  (was: 4h 10m)

> IOW Directory queries fails to write data to final path when query result 
> cache is enabled
> --
>
> Key: HIVE-25907
> URL: https://issues.apache.org/jira/browse/HIVE-25907
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> INSERT OVERWRITE DIRECTORY queries fails to write the data to the specified 
> directory location when query result cache is enabled.
> *Steps to reproduce*
> {code:java}
> 1. create a data file with the following data
> 1 abc 10.5
> 2 def 11.5
> 2. create table pointing to that data
> create external table iowd(strct struct)
> row format delimited
> fields terminated by '\t'
> collection items terminated by ' '
> location '';
> 3. run the following query
> set hive.query.results.cache.enabled=true;
> INSERT OVERWRITE DIRECTORY "" SELECT * FROM iowd;
> {code}
> After execution of the above query, It is expected that the destination 
> directory contains data from the table iowd, But due to HIVE-21386 it is not 
> happening anymore.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776726&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776726
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 09:07
Start Date: 01/Jun/22 09:07
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886570597


##
ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:
##
@@ -739,6 +742,11 @@ protected void initializeOp(Configuration hconf) throws 
HiveException {
 }
   }
 
+  private void setWriteOperation(Configuration conf) {

Review Comment:
   Moved the read part to InputFormatConfig.java, but it is still set in 
FileSinkOperator.





Issue Time Tracking
---

Worklog Id: (was: 776726)
Time Spent: 2.5h  (was: 2h 20m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 15 m

[jira] [Updated] (HIVE-26277) Add unit tests for ColumnStatsAggregator classes

2022-06-01 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26277:

Component/s: Standalone Metastore

> Add unit tests for ColumnStatsAggregator classes
> 
>
> Key: HIVE-26277
> URL: https://issues.apache.org/jira/browse/HIVE-26277
> Project: Hive
>  Issue Type: Test
>  Components: Standalone Metastore, Statistics, Tests
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> We have no unit tests covering these classes, which also happen to contain 
> some complicated logic, making the absence of tests even more risky.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work started] (HIVE-26277) Add unit tests for ColumnStatsAggregator classes

2022-06-01 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26277 started by Alessandro Solimando.
---
> Add unit tests for ColumnStatsAggregator classes
> 
>
> Key: HIVE-26277
> URL: https://issues.apache.org/jira/browse/HIVE-26277
> Project: Hive
>  Issue Type: Test
>  Components: Statistics, Tests
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> We have no unit tests covering these classes, which also happen to contain 
> some complicated logic, making the absence of tests even more risky.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776719&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776719
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 08:38
Start Date: 01/Jun/22 08:38
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886542319


##
ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:
##
@@ -932,7 +940,9 @@ protected void createBucketForFileIdx(FSPaths fsp, int 
filesIdx)
 && !FileUtils.mkdir(fs, outPath.getParent(), hconf)) {
   LOG.warn("Unable to create directory with inheritPerms: " + outPath);
 }
-fsp.outWriters[filesIdx] = HiveFileFormatUtils.getHiveRecordWriter(jc, 
conf.getTableInfo(),
+JobConf jobConf = new JobConf(jc);
+setWriteOperation(jobConf);
+fsp.outWriters[filesIdx] = 
HiveFileFormatUtils.getHiveRecordWriter(jobConf, conf.getTableInfo(),

Review Comment:
   Changing the method signature would alters all file formats Hive supports





Issue Time Tracking
---

Worklog Id: (was: 776719)
Time Spent: 2h 20m  (was: 2h 10m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.proce

[jira] [Assigned] (HIVE-26277) Add unit tests for ColumnStatsAggregator classes

2022-06-01 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-26277:
---


> Add unit tests for ColumnStatsAggregator classes
> 
>
> Key: HIVE-26277
> URL: https://issues.apache.org/jira/browse/HIVE-26277
> Project: Hive
>  Issue Type: Test
>  Components: Statistics, Tests
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> We have no unit tests covering these classes, which also happen to contain 
> some complicated logic, making the absence of tests even more risky.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776717&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776717
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 08:36
Start Date: 01/Jun/22 08:36
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886540554


##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -15005,6 +15004,12 @@ private AcidUtils.Operation getAcidType(String 
destination) {
 AcidUtils.Operation.INSERT);
   }
 
+  private Context.Operation getWriteOperation(String destination) {

Review Comment:
   The value in `destination` has a changing part and it is can not be mapped 
easily to an enum constant





Issue Time Tracking
---

Worklog Id: (was: 776717)
Time Spent: 2h 10m  (was: 2h)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initiali

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776716&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776716
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 08:35
Start Date: 01/Jun/22 08:35
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886538949


##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -11433,6 +11435,7 @@ private Operator genTablePlan(String alias, QB qb) 
throws SemanticException {
   // Determine row schema for TSOP.
   // Include column names from SerDe, the partition and virtual columns.
   rwsch = new RowResolver();
+

Review Comment:
   reverted





Issue Time Tracking
---

Worklog Id: (was: 776716)
Time Spent: 2h  (was: 1h 50m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776714&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776714
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 08:33
Start Date: 01/Jun/22 08:33
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886537498


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -259,22 +258,27 @@ public void initialize(InputSplit split, 
TaskAttemptContext newContext) {
   this.inMemoryDataModel = 
conf.getEnum(InputFormatConfig.IN_MEMORY_DATA_MODEL,
   InputFormatConfig.InMemoryDataModel.GENERIC);
   this.currentIterator = open(tasks.next(), expectedSchema).iterator();
-  Operation operation = HiveIcebergStorageHandler.operation(conf, 
conf.get(Catalogs.NAME));
-  this.updateOrDelete = Operation.DELETE.equals(operation) || 
Operation.UPDATE.equals(operation);
+  this.fetchVirtualColumns = InputFormatConfig.fetchVirtualColumns(conf);
 }
 
 @Override
 public boolean nextKeyValue() throws IOException {
   while (true) {
 if (currentIterator.hasNext()) {
   current = currentIterator.next();
-  if (updateOrDelete) {
+  if (fetchVirtualColumns) {
 GenericRecord rec = (GenericRecord) current;
 PositionDeleteInfo.setIntoConf(conf,
 IcebergAcidUtil.parseSpecId(rec),
 IcebergAcidUtil.computePartitionHash(rec),
 IcebergAcidUtil.parseFilePath(rec),
 IcebergAcidUtil.parseFilePosition(rec));
+GenericRecord tmp = GenericRecord.create(

Review Comment:
   Created a separate class to handle and wrap this. 
`VirtualColumnAwareIterator`





Issue Time Tracking
---

Worklog Id: (was: 776714)
Time Spent: 1h 50m  (was: 1h 40m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunne

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=776711&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776711
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 08:32
Start Date: 01/Jun/22 08:32
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r886536461


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -259,22 +258,27 @@ public void initialize(InputSplit split, 
TaskAttemptContext newContext) {
   this.inMemoryDataModel = 
conf.getEnum(InputFormatConfig.IN_MEMORY_DATA_MODEL,
   InputFormatConfig.InMemoryDataModel.GENERIC);
   this.currentIterator = open(tasks.next(), expectedSchema).iterator();
-  Operation operation = HiveIcebergStorageHandler.operation(conf, 
conf.get(Catalogs.NAME));
-  this.updateOrDelete = Operation.DELETE.equals(operation) || 
Operation.UPDATE.equals(operation);
+  this.fetchVirtualColumns = InputFormatConfig.fetchVirtualColumns(conf);
 }
 
 @Override
 public boolean nextKeyValue() throws IOException {
   while (true) {
 if (currentIterator.hasNext()) {
   current = currentIterator.next();
-  if (updateOrDelete) {
+  if (fetchVirtualColumns) {
 GenericRecord rec = (GenericRecord) current;
 PositionDeleteInfo.setIntoConf(conf,
 IcebergAcidUtil.parseSpecId(rec),
 IcebergAcidUtil.computePartitionHash(rec),
 IcebergAcidUtil.parseFilePath(rec),
 IcebergAcidUtil.parseFilePosition(rec));
+GenericRecord tmp = GenericRecord.create(
+new Schema(expectedSchema.columns().subList(4, 
expectedSchema.columns().size(;
+for (int i = 4; i < expectedSchema.columns().size(); ++i) {

Review Comment:
   Moved





Issue Time Tracking
---

Worklog Id: (was: 776711)
Time Spent: 1h 40m  (was: 1.5h)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2

[jira] [Work logged] (HIVE-25907) IOW Directory queries fails to write data to final path when query result cache is enabled

2022-06-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25907?focusedWorklogId=776704&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776704
 ]

ASF GitHub Bot logged work on HIVE-25907:
-

Author: ASF GitHub Bot
Created on: 01/Jun/22 07:51
Start Date: 01/Jun/22 07:51
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on PR #2978:
URL: https://github.com/apache/hive/pull/2978#issuecomment-1143238349

   @pvary - Thanks for the review. 
   Are we good to merge this PR?




Issue Time Tracking
---

Worklog Id: (was: 776704)
Time Spent: 4h 10m  (was: 4h)

> IOW Directory queries fails to write data to final path when query result 
> cache is enabled
> --
>
> Key: HIVE-25907
> URL: https://issues.apache.org/jira/browse/HIVE-25907
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> INSERT OVERWRITE DIRECTORY queries fails to write the data to the specified 
> directory location when query result cache is enabled.
> *Steps to reproduce*
> {code:java}
> 1. create a data file with the following data
> 1 abc 10.5
> 2 def 11.5
> 2. create table pointing to that data
> create external table iowd(strct struct)
> row format delimited
> fields terminated by '\t'
> collection items terminated by ' '
> location '';
> 3. run the following query
> set hive.query.results.cache.enabled=true;
> INSERT OVERWRITE DIRECTORY "" SELECT * FROM iowd;
> {code}
> After execution of the above query, It is expected that the destination 
> directory contains data from the table iowd, But due to HIVE-21386 it is not 
> happening anymore.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

83 matches

Mail list logo