Re: [PR] HIVE-12371: Use DriverManager.getLoginTimeout() as fallback [hive]

2024-02-28 Thread via GitHub


jneira-stratio commented on PR #4964:
URL: https://github.com/apache/hive/pull/4964#issuecomment-1970553683

   > there are some 180+ test failures & all looks related
   
   They surely might be. However i cant see how the change _standalone_ could 
cause them as it is restoring a previous behaviour (prev to #1611) and 
DriverManager.loginTimeout default value is 0 afaics (the previous fixed 
default value).
   Sorry i dont have time to take a deeper look, feel free to close it (well it 
will be closed auto anyway i guess).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-27775: DirectSQL and JDO results are different when fetching par… [hive]

2024-02-28 Thread via GitHub


sonarcloud[bot] commented on PR #4959:
URL: https://github.com/apache/hive/pull/4959#issuecomment-1970288996

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=4959) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [13 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=4959=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png
 '') [0 Accepted 
issues](https://sonarcloud.io/component_measures?id=apache_hive=4959=new_accepted_issues=list)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4959=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=4959)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-27775: DirectSQL and JDO results are different when fetching par… [hive]

2024-02-28 Thread via GitHub


dengzhhu653 commented on code in PR #4959:
URL: https://github.com/apache/hive/pull/4959#discussion_r1506862344


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -1424,30 +1422,36 @@ public void visit(LeafNode node) throws MetaException {
 return;
   }
 
+  String nodeValue0 = "?";
   // if Filter.g does date parsing for quoted strings, we'd need to verify 
there's no
   // type mismatch when string col is filtered by a string that looks like 
date.
-  if (colType == FilterType.Date && valType == FilterType.String) {
-// Filter.g cannot parse a quoted date; try to parse date here too.
+  if (colType == FilterType.Date) {
 try {
-  nodeValue = MetaStoreUtils.convertStringToDate((String)nodeValue);
+  // check the nodeValue is a valid date
+  nodeValue = MetaStoreUtils.convertDateToString(

Review Comment:
   This is for converting the string date to a timezone based string date, we 
need it on the direct sql:
   
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java#L263-L269



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-12371: Use DriverManager.getLoginTimeout() as fallback [hive]

2024-02-28 Thread via GitHub


github-actions[bot] commented on PR #4964:
URL: https://github.com/apache/hive/pull/4964#issuecomment-1970152782

   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] Update Hive version in Docker README [hive]

2024-02-28 Thread via GitHub


sonarcloud[bot] commented on PR #5105:
URL: https://github.com/apache/hive/pull/5105#issuecomment-1969803673

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5105) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5105=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png
 '') [0 Accepted 
issues](https://sonarcloud.io/component_measures?id=apache_hive=5105=new_accepted_issues=list)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5105=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5105)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[PR] Update Hive version in Docker README [hive]

2024-02-28 Thread via GitHub


mertyyanik opened a new pull request, #5105:
URL: https://github.com/apache/hive/pull/5105

   
   
   ### What changes were proposed in this pull request?
   While the quickstart page for Docker was up to date, the readme was not. I 
added the updates on the page to the readme.
   Quicstart page: https://hive.apache.org/developement/quickstart/
   
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   It allows the user to see the current readme page.
   
   
   
   ### Is the change a dependency upgrade?
   No.
   
   
   
   ### How was this patch tested?
   Since it only covers changes to the readme, I confirmed that the readme 
reads properly on markdown.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28093: Re-execute DAG in case of NoCurrentDAGException [hive]

2024-02-28 Thread via GitHub


sonarcloud[bot] commented on PR #5103:
URL: https://github.com/apache/hive/pull/5103#issuecomment-1969626662

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5103) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [2 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5103=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png
 '') [0 Accepted 
issues](https://sonarcloud.io/component_measures?id=apache_hive=5103=new_accepted_issues=list)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [1 Security 
Hotspot](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5103=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5103)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-27775: DirectSQL and JDO results are different when fetching par… [hive]

2024-02-28 Thread via GitHub


deniskuzZ commented on code in PR #4959:
URL: https://github.com/apache/hive/pull/4959#discussion_r1506164080


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -1505,14 +1507,9 @@ public void visit(LeafNode node) throws MetaException {
   params.add(catName.toLowerCase());
 }
 tableValue += " then " + tableValue0 + " else null end)";
-
-if (valType == FilterType.Date) {
-   tableValue = dbType.toDate(tableValue);
-} else if (valType == FilterType.Timestamp) {
-  tableValue = dbType.toTimestamp(tableValue);
-}
   }
-  if (!node.isReverseOrder) {
+
+  if (!node.isReverseOrder && nodeValue != null) {

Review Comment:
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-27775: DirectSQL and JDO results are different when fetching par… [hive]

2024-02-28 Thread via GitHub


deniskuzZ commented on code in PR #4959:
URL: https://github.com/apache/hive/pull/4959#discussion_r1506152341


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -1424,30 +1422,36 @@ public void visit(LeafNode node) throws MetaException {
 return;
   }
 
+  String nodeValue0 = "?";
   // if Filter.g does date parsing for quoted strings, we'd need to verify 
there's no
   // type mismatch when string col is filtered by a string that looks like 
date.
-  if (colType == FilterType.Date && valType == FilterType.String) {
-// Filter.g cannot parse a quoted date; try to parse date here too.
+  if (colType == FilterType.Date) {
 try {
-  nodeValue = MetaStoreUtils.convertStringToDate((String)nodeValue);
+  // check the nodeValue is a valid date
+  nodeValue = MetaStoreUtils.convertDateToString(

Review Comment:
   PartFilterVisitor returns now String date representation after validation, 
isn't it? the question wasn't about changing the returned type, but why we need 
these 2 conversions? 
   
   nodeValue = 
MetaStoreUtils.convertDateToString(MetaStoreUtils.convertStringToDate((String) 
nodeValue));
   
   if we really need them for some reason could extract that logic into 
`MetaStoreUtils.normalizeDate()` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-27775: DirectSQL and JDO results are different when fetching par… [hive]

2024-02-28 Thread via GitHub


deniskuzZ commented on code in PR #4959:
URL: https://github.com/apache/hive/pull/4959#discussion_r1506152341


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -1424,30 +1422,36 @@ public void visit(LeafNode node) throws MetaException {
 return;
   }
 
+  String nodeValue0 = "?";
   // if Filter.g does date parsing for quoted strings, we'd need to verify 
there's no
   // type mismatch when string col is filtered by a string that looks like 
date.
-  if (colType == FilterType.Date && valType == FilterType.String) {
-// Filter.g cannot parse a quoted date; try to parse date here too.
+  if (colType == FilterType.Date) {
 try {
-  nodeValue = MetaStoreUtils.convertStringToDate((String)nodeValue);
+  // check the nodeValue is a valid date
+  nodeValue = MetaStoreUtils.convertDateToString(

Review Comment:
   PartFilterVisitor returns now String date representation after validation, 
isn't it? the question wasn't about changing the returned type, but why we need 
these 2 conversions? 
   
   nodeValue = nodeValue = 
MetaStoreUtils.convertDateToString(MetaStoreUtils.convertStringToDate((String) 
nodeValue));
   
   if we really need them for some reason could extract that logic into 
`MetaStoreUtils.normalizeDate()` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-27775: DirectSQL and JDO results are different when fetching par… [hive]

2024-02-28 Thread via GitHub


deniskuzZ commented on code in PR #4959:
URL: https://github.com/apache/hive/pull/4959#discussion_r1506152341


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -1424,30 +1422,36 @@ public void visit(LeafNode node) throws MetaException {
 return;
   }
 
+  String nodeValue0 = "?";
   // if Filter.g does date parsing for quoted strings, we'd need to verify 
there's no
   // type mismatch when string col is filtered by a string that looks like 
date.
-  if (colType == FilterType.Date && valType == FilterType.String) {
-// Filter.g cannot parse a quoted date; try to parse date here too.
+  if (colType == FilterType.Date) {
 try {
-  nodeValue = MetaStoreUtils.convertStringToDate((String)nodeValue);
+  // check the nodeValue is a valid date
+  nodeValue = MetaStoreUtils.convertDateToString(

Review Comment:
   PartFilterVisitor returns String date representation after validation, isn't 
it?



##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -1424,30 +1422,36 @@ public void visit(LeafNode node) throws MetaException {
 return;
   }
 
+  String nodeValue0 = "?";
   // if Filter.g does date parsing for quoted strings, we'd need to verify 
there's no
   // type mismatch when string col is filtered by a string that looks like 
date.
-  if (colType == FilterType.Date && valType == FilterType.String) {
-// Filter.g cannot parse a quoted date; try to parse date here too.
+  if (colType == FilterType.Date) {
 try {
-  nodeValue = MetaStoreUtils.convertStringToDate((String)nodeValue);
+  // check the nodeValue is a valid date
+  nodeValue = MetaStoreUtils.convertDateToString(

Review Comment:
   PartFilterVisitor returns now String date representation after validation, 
isn't it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


sonarcloud[bot] commented on PR #5076:
URL: https://github.com/apache/hive/pull/5076#issuecomment-1969109963

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5076) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [5 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5076=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png
 '') [0 Accepted 
issues](https://sonarcloud.io/component_measures?id=apache_hive=5076=new_accepted_issues=list)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5076=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5076)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28093: Re-execute DAG in case of NoCurrentDAGException [hive]

2024-02-28 Thread via GitHub


abstractdog commented on PR #5103:
URL: https://github.com/apache/hive/pull/5103#issuecomment-1968997117

   > Minor Stuff, else looks good
   
   thanks a lot, addressed your comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-25972: HIVE_VECTORIZATION_USE_ROW_DESERIALIZE in hiveconf.java imply default value is false,in fact the default value is 'true' [hive]

2024-02-28 Thread via GitHub


SourabhBadhya merged PR #3824:
URL: https://github.com/apache/hive/pull/3824


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28093: Re-execute DAG in case of NoCurrentDAGException [hive]

2024-02-28 Thread via GitHub


abstractdog commented on code in PR #5103:
URL: https://github.com/apache/hive/pull/5103#discussion_r1505911098


##
ql/src/test/org/apache/hadoop/hive/ql/reexec/TestReExecuteLostAMQueryPlugin.java:
##
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.reexec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.exec.tez.TezRuntimeException;
+import org.apache.hadoop.hive.ql.hooks.HookContext;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestReExecuteLostAMQueryPlugin {
+
+  @Test
+  public void testRetryOnUnmanagedAmFailure() throws Exception {
+ReExecuteLostAMQueryPlugin plugin = new ReExecuteLostAMQueryPlugin();
+ReExecuteLostAMQueryPlugin.LocalHook hook = plugin.new LocalHook();
+
+HookContext context = new HookContext(null, 
QueryState.getNewQueryState(new HiveConf(), null), null, null, null,
+null, null, null, null, false, null, null);
+context.setHookType(HookContext.HookType.ON_FAILURE_HOOK);
+context.setException(new TezRuntimeException("dag_0_0", "AM record not 
found (likely died)"));
+
+hook.run(context);
+
+Assert.assertEquals(true, plugin.shouldReExecute(1));
+  }
+
+  @Test
+  public void testRetryOnNoCurrentDAGException() throws Exception {

Review Comment:
   ack



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28093: Re-execute DAG in case of NoCurrentDAGException [hive]

2024-02-28 Thread via GitHub


abstractdog commented on code in PR #5103:
URL: https://github.com/apache/hive/pull/5103#discussion_r1505909179


##
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecuteLostAMQueryPlugin.java:
##
@@ -44,15 +50,28 @@ public void run(HookContext hookContext) throws Exception {
   if (hookContext.getHookType() == HookContext.HookType.ON_FAILURE_HOOK) {
 Throwable exception = hookContext.getException();
 
-if (exception != null && exception.getMessage() != null) {
+if (!(exception instanceof TezRuntimeException)) {
+  LOG.info("Exception is not a TezRuntimeException, no need to check 
further with ReExecuteLostAMQueryPlugin");
+  return;
+}
+
+TezRuntimeException tre = (TezRuntimeException)exception;
+
+if (tre != null && tre.getMessage() != null) {
+  dagIds.add(tre.getDagId());
   // When HS2 does not manage the AMs, tez AMs are registered with 
zookeeper and HS2 discovers it,
   // failure of unmanaged AMs will throw AM record not being found in 
zookeeper.
   String unmanagedAMFailure = "AM record not found (likely died)";
-  if 
(lostAMContainerErrorPattern.matcher(exception.getMessage()).matches()
-  || exception.getMessage().contains(unmanagedAMFailure)) {
+  // DAG lost in the scenario described at TEZ-4543
+  String dagLostFailure = "No running DAG at present";
+
+  if (lostAMContainerErrorPattern.matcher(tre.getMessage()).matches()
+  || tre.getMessage().contains(unmanagedAMFailure)
+  || tre.getMessage().contains(dagLostFailure)) {
 retryPossible = true;
   }
-  LOG.info("Got exception message: {} retryPossible: {}", 
exception.getMessage(), retryPossible);
+  LOG.info("Got exception message: {} retryPossible: {}, dags seen so 
far: {}", tre.getMessage(), retryPossible,

Review Comment:
   ack



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28093: Re-execute DAG in case of NoCurrentDAGException [hive]

2024-02-28 Thread via GitHub


abstractdog commented on code in PR #5103:
URL: https://github.com/apache/hive/pull/5103#discussion_r1505907827


##
ql/src/test/org/apache/hadoop/hive/ql/reexec/TestReExecuteLostAMQueryPlugin.java:
##
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.reexec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.exec.tez.TezRuntimeException;
+import org.apache.hadoop.hive.ql.hooks.HookContext;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestReExecuteLostAMQueryPlugin {
+
+  @Test
+  public void testRetryOnUnmanagedAmFailure() throws Exception {
+ReExecuteLostAMQueryPlugin plugin = new ReExecuteLostAMQueryPlugin();
+ReExecuteLostAMQueryPlugin.LocalHook hook = plugin.new LocalHook();
+
+HookContext context = new HookContext(null, 
QueryState.getNewQueryState(new HiveConf(), null), null, null, null,
+null, null, null, null, false, null, null);
+context.setHookType(HookContext.HookType.ON_FAILURE_HOOK);
+context.setException(new TezRuntimeException("dag_0_0", "AM record not 
found (likely died)"));
+
+hook.run(context);
+
+Assert.assertEquals(true, plugin.shouldReExecute(1));

Review Comment:
   ack



##
ql/src/test/org/apache/hadoop/hive/ql/reexec/TestReExecuteLostAMQueryPlugin.java:
##
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.reexec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.exec.tez.TezRuntimeException;
+import org.apache.hadoop.hive.ql.hooks.HookContext;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestReExecuteLostAMQueryPlugin {
+
+  @Test
+  public void testRetryOnUnmanagedAmFailure() throws Exception {
+ReExecuteLostAMQueryPlugin plugin = new ReExecuteLostAMQueryPlugin();
+ReExecuteLostAMQueryPlugin.LocalHook hook = plugin.new LocalHook();
+
+HookContext context = new HookContext(null, 
QueryState.getNewQueryState(new HiveConf(), null), null, null, null,
+null, null, null, null, false, null, null);
+context.setHookType(HookContext.HookType.ON_FAILURE_HOOK);
+context.setException(new TezRuntimeException("dag_0_0", "AM record not 
found (likely died)"));
+
+hook.run(context);
+
+Assert.assertEquals(true, plugin.shouldReExecute(1));
+  }
+
+  @Test
+  public void testRetryOnNoCurrentDAGException() throws Exception {
+ReExecuteLostAMQueryPlugin plugin = new ReExecuteLostAMQueryPlugin();
+ReExecuteLostAMQueryPlugin.LocalHook hook = plugin.new LocalHook();
+
+HookContext context = new HookContext(null, 
QueryState.getNewQueryState(new HiveConf(), null), null, null, null,
+null, null, null, null, false, null, null);
+context.setHookType(HookContext.HookType.ON_FAILURE_HOOK);
+context.setException(new TezRuntimeException("dag_0_0", "No running DAG at 
present"));
+
+hook.run(context);
+
+Assert.assertEquals(true, plugin.shouldReExecute(1));

Review Comment:
   ack



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact 

Re: [PR] HIVE-28093: Re-execute DAG in case of NoCurrentDAGException [hive]

2024-02-28 Thread via GitHub


abstractdog commented on code in PR #5103:
URL: https://github.com/apache/hive/pull/5103#discussion_r1505906262


##
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecuteLostAMQueryPlugin.java:
##
@@ -44,15 +50,28 @@ public void run(HookContext hookContext) throws Exception {
   if (hookContext.getHookType() == HookContext.HookType.ON_FAILURE_HOOK) {
 Throwable exception = hookContext.getException();
 
-if (exception != null && exception.getMessage() != null) {
+if (!(exception instanceof TezRuntimeException)) {
+  LOG.info("Exception is not a TezRuntimeException, no need to check 
further with ReExecuteLostAMQueryPlugin");
+  return;
+}
+
+TezRuntimeException tre = (TezRuntimeException)exception;
+
+if (tre != null && tre.getMessage() != null) {

Review Comment:
   ack



##
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecuteLostAMQueryPlugin.java:
##
@@ -33,6 +36,9 @@
 public class ReExecuteLostAMQueryPlugin implements IReExecutionPlugin {
   private static final Logger LOG = 
LoggerFactory.getLogger(ReExecuteLostAMQueryPlugin.class);
   private boolean retryPossible;
+  // a list to track DAG ids seen by this re-execution plugin during the same 
query
+  // it can help a lot with identifying the previous DAGs in case of retries
+  private List dagIds = new ArrayList<>();

Review Comment:
   ack



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28093: Re-execute DAG in case of NoCurrentDAGException [hive]

2024-02-28 Thread via GitHub


abstractdog commented on code in PR #5103:
URL: https://github.com/apache/hive/pull/5103#discussion_r1505901461


##
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecuteLostAMQueryPlugin.java:
##
@@ -44,15 +50,28 @@ public void run(HookContext hookContext) throws Exception {
   if (hookContext.getHookType() == HookContext.HookType.ON_FAILURE_HOOK) {
 Throwable exception = hookContext.getException();
 
-if (exception != null && exception.getMessage() != null) {
+if (!(exception instanceof TezRuntimeException)) {
+  LOG.info("Exception is not a TezRuntimeException, no need to check 
further with ReExecuteLostAMQueryPlugin");
+  return;
+}
+
+TezRuntimeException tre = (TezRuntimeException)exception;
+
+if (tre != null && tre.getMessage() != null) {
+  dagIds.add(tre.getDagId());
   // When HS2 does not manage the AMs, tez AMs are registered with 
zookeeper and HS2 discovers it,
   // failure of unmanaged AMs will throw AM record not being found in 
zookeeper.
   String unmanagedAMFailure = "AM record not found (likely died)";
-  if 
(lostAMContainerErrorPattern.matcher(exception.getMessage()).matches()
-  || exception.getMessage().contains(unmanagedAMFailure)) {
+  // DAG lost in the scenario described at TEZ-4543
+  String dagLostFailure = "No running DAG at present";

Review Comment:
   yes, absolutely



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28083: Enable HMS client cache and HMS query cache for Explain p… [hive]

2024-02-28 Thread via GitHub


zabetak closed pull request #5092: HIVE-28083: Enable HMS client cache and HMS 
query cache for Explain p…
URL: https://github.com/apache/hive/pull/5092


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28094: Improve HMS client cache and query cache performance for … [hive]

2024-02-28 Thread via GitHub


sonarcloud[bot] commented on PR #5102:
URL: https://github.com/apache/hive/pull/5102#issuecomment-1968732069

   ## [![Quality Gate 
Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png
 'Quality Gate 
Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5102) 
**Quality Gate passed**  
   Issues  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [9 New 
issues](https://sonarcloud.io/project/issues?id=apache_hive=5102=false=true)
   
   Measures  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png
 '') [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5102=false=true)
  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Coverage  
   
![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png
 '') No data about Duplication  
 
   [See analysis details on 
SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5102)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-27848 - TestCrudCompactorOnTez.secondCompactionShouldBeRefusedBe… [hive]

2024-02-28 Thread via GitHub


zabetak commented on code in PR #4859:
URL: https://github.com/apache/hive/pull/4859#discussion_r1505717881


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java:
##
@@ -295,4 +324,230 @@ public static LockRequest createLockRequest(HiveConf 
conf, CompactionInfo ci, lo
 !conf.getBoolVar(HiveConf.ConfVars.TXN_WRITE_X_LOCK));
 return requestBuilder.build();
   }
+
+  private static CompactionResponse requestCompaction(CompactionInfo ci, 
String runAs, String hostname,
+  String runtimeVersion, TxnStore txnHandler) throws MetaException {
+CompactionRequest compactionRequest = new CompactionRequest(ci.dbname, 
ci.tableName, ci.type);
+if (ci.partName != null)
+  compactionRequest.setPartitionname(ci.partName);
+compactionRequest.setRunas(runAs);
+if (StringUtils.isEmpty(ci.initiatorId)) {
+  compactionRequest.setInitiatorId(hostname + "-" + 
Thread.currentThread().getId());
+} else {
+  compactionRequest.setInitiatorId(ci.initiatorId);
+}
+compactionRequest.setInitiatorVersion(runtimeVersion);
+compactionRequest.setPoolName(ci.poolName);
+LOG.info("Requesting compaction: " + compactionRequest);
+CompactionResponse resp = txnHandler.compact(compactionRequest);
+if (resp.isAccepted()) {
+  ci.id = resp.getId();
+}
+return resp;
+  }
+
+  private static CompactionType determineCompactionType(CompactionInfo ci, 
AcidDirectory dir,
+  Map tblProperties, long baseSize, long deltaSize, 
HiveConf conf) {
+boolean noBase = false;
+List deltas = dir.getCurrentDirectories();
+if (baseSize == 0 && deltaSize > 0) {
+  noBase = true;
+} else {
+  String deltaPctProp =
+  tblProperties.get(COMPACTOR_THRESHOLD_PREFIX + 
HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD);
+  float deltaPctThreshold = deltaPctProp == null ? 
HiveConf.getFloatVar(conf,
+  HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD) : 
Float.parseFloat(deltaPctProp);
+  boolean bigEnough = (float) deltaSize / (float) baseSize > 
deltaPctThreshold;
+  boolean multiBase = dir.getObsolete().stream().anyMatch(path -> 
path.getName().startsWith(AcidUtils.BASE_PREFIX));
+
+  boolean initiateMajor = bigEnough || (deltaSize == 0 && multiBase);
+  if (LOG.isDebugEnabled()) {
+StringBuilder msg = new StringBuilder("delta size: ");
+msg.append(deltaSize);
+msg.append(" base size: ");
+msg.append(baseSize);
+msg.append(" multiBase ");
+msg.append(multiBase);
+msg.append(" deltaSize ");
+msg.append(deltaSize);
+msg.append(" threshold: ");
+msg.append(deltaPctThreshold);
+msg.append(" delta/base ratio > 
").append(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD.varname)
+.append(": ");
+msg.append(bigEnough);
+msg.append(".");
+if (!initiateMajor) {
+  msg.append("not");
+}
+msg.append(" initiating major compaction.");
+LOG.debug(msg.toString());
+  }
+  if (initiateMajor)
+return CompactionType.MAJOR;
+}
+
+String deltaNumProp =
+tblProperties.get(COMPACTOR_THRESHOLD_PREFIX + 
HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_NUM_THRESHOLD);
+int deltaNumThreshold = deltaNumProp == null ? HiveConf.getIntVar(conf,
+HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_NUM_THRESHOLD) : 
Integer.parseInt(deltaNumProp);
+boolean enough = deltas.size() > deltaNumThreshold;
+if (!enough) {
+  LOG.debug(
+  "Not enough deltas to initiate compaction for table=" + ci.tableName 
+ "partition=" + ci.partName
+  + ". Found: " + deltas.size() + " deltas, threshold is " + 
deltaNumThreshold);
+  return null;
+}
+// If there's no base file, do a major compaction
+LOG.debug("Found " + deltas.size() + " delta files, and " + (noBase ? "no" 
: "has") + " base," + "requesting "
++ (noBase ? "major" : "minor") + " compaction");
+
+return noBase || !isMinorCompactionSupported(conf, tblProperties,
+dir) ? CompactionType.MAJOR : CompactionType.MINOR;
+  }
+
+  private static long getBaseSize(AcidDirectory dir) throws IOException {
+long baseSize = 0;
+if (dir.getBase() != null) {
+  baseSize = getDirSize(dir.getFs(), dir.getBase());
+} else {
+  for (HadoopShims.HdfsFileStatusWithId origStat : dir.getOriginalFiles()) 
{
+baseSize += origStat.getFileStatus().getLen();
+  }
+}
+return baseSize;
+  }
+
+  private static long getDirSize(FileSystem fs, AcidUtils.ParsedDirectory dir) 
throws IOException {
+return dir.getFiles(fs, 
Ref.from(false)).stream().map(HadoopShims.HdfsFileStatusWithId::getFileStatus)
+.mapToLong(FileStatus::getLen).sum();
+  }
+
+  private static CompactionType checkForCompaction(final CompactionInfo ci, 
final ValidWriteIdList writeIds,
+  final StorageDescriptor sd, final Map 

Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505610108


##
ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java:
##
@@ -370,7 +371,8 @@ private InputSplit[] getCombineSplits(JobConf job, int 
numSplits,
   PartitionDesc part = HiveFileFormatUtils.getFromPathRecursively(
   pathToPartitionInfo, path, 
IOPrepareCache.get().allocatePartitionDescMap());
   TableDesc tableDesc = part.getTableDesc();
-  if ((tableDesc != null) && tableDesc.isNonNative()) {
+  boolean useDefaultFileFormat = 
part.getInputFileFormatClass().equals(tableDesc.getInputFileFormatClass());

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505609448


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -163,6 +167,26 @@ public void commitTask(TaskAttemptContext originalContext) 
throws IOException {
   LOG.info("CommitTask found no serialized table in config for 
table: {}.", output);
 }
   }, IOException.class);
+
+  // Merge task has merged several files into one. Hence we need to remove 
the stale files.
+  // At this stage the file is written and task-committed, but the old 
files are still present.
+  if 
(CombineHiveInputFormat.class.equals(jobConf.getInputFormat().getClass())) {
+MapWork mrwork = Utilities.getMapWork(jobConf);
+if (mrwork != null) {
+  List mergedPaths = mrwork.getInputPaths();
+  if (CollectionUtils.isNotEmpty(mergedPaths)) {
+Tasks.foreach(mergedPaths)
+.retry(3)
+.stopOnFailure()
+.throwFailureWhenFinished()

Review Comment:
   I think we should not fail. Removed `stopOnFailure` and 
`throwFailureWhenFinished`, so as to not fail upon failures.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505607838


##
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java:
##
@@ -1720,6 +1723,41 @@ protected static MoveWork mergeMovePaths(Path 
condInputPath, MoveWork linkedMove
 return newWork;
   }
 
+  private static void 
setCustomStorageHandlerPropertiesForMerge(ConditionalResolverMergeFilesCtx 
mrCtx, MoveWork work) {

Review Comment:
   Renamed it to `setCustomStorageHandlerAndPropertiesForMerge`.



##
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java:
##
@@ -1720,6 +1723,41 @@ protected static MoveWork mergeMovePaths(Path 
condInputPath, MoveWork linkedMove
 return newWork;
   }
 
+  private static void 
setCustomStorageHandlerPropertiesForMerge(ConditionalResolverMergeFilesCtx 
mrCtx, MoveWork work) {

Review Comment:
   Renamed it to `setStorageHandlerAndPropertiesForMerge`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505607087


##
ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java:
##
@@ -147,18 +171,23 @@ public List> getTasks(HiveConf conf, Object 
objCtx) {
   Path dirPath = new Path(dirName);
   FileSystem inpFs = dirPath.getFileSystem(conf);
   DynamicPartitionCtx dpCtx = ctx.getDPCtx();
+  HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(conf, 
ctx.getStorageHandlerClass());
+  boolean dirExists = inpFs.exists(dirPath);
+  boolean useCustomStorageHandler = storageHandler != null && 
storageHandler.supportsMergeFiles();
 
-  if (inpFs.exists(dirPath)) {
+  MapWork work = null;
+  if (dirExists || useCustomStorageHandler) {

Review Comment:
   Removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505606712


##
ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java:
##
@@ -254,18 +286,26 @@ private void generateActualTasks(HiveConf conf, 
List> resTsks,
   long trgtSize, long avgConditionSize, Task mvTask,
   Task mrTask, Task mrAndMvTask, Path dirPath,
   FileSystem inpFs, ConditionalResolverMergeFilesCtx ctx, MapWork work, 
int dpLbLevel,
-  boolean manifestFilePresent)
-  throws IOException {
+  boolean manifestFilePresent, HiveStorageHandler storageHandler)
+  throws IOException, ClassNotFoundException {
 DynamicPartitionCtx dpCtx = ctx.getDPCtx();
 List statusList;
-Map> manifestDirToFile = new HashMap<>();
+Map> parentDirToFile = new HashMap<>();
+boolean useCustomStorageHandler = storageHandler != null && 
storageHandler.supportsMergeFiles();
+StorageHandlerMergeProperties mergeProperties = useCustomStorageHandler ?
+
storageHandler.getStorageHandlerMergeProperties(ctx.getCustomStorageHandlerProps())
 : null;
 if (manifestFilePresent) {
   // Get the list of files from manifest file.
   List fileStatuses = getManifestFilePaths(conf, dirPath);
   // Setup the work to include all the files present in the manifest.
   setupWorkWhenUsingManifestFile(work, fileStatuses, dirPath, false);
-  manifestDirToFile = getManifestDirs(inpFs, fileStatuses);
-  statusList = new ArrayList<>(manifestDirToFile.keySet());
+  parentDirToFile = getParentDirToFileMap(inpFs, fileStatuses);
+  statusList = new ArrayList<>(parentDirToFile.keySet());
+} else if (useCustomStorageHandler) {
+  List fileStatuses = 
storageHandler.getMergeInputFiles(ctx.getCustomStorageHandlerProps());
+  setupWorkWhenUsingCustomHandler(work, dirPath, mergeProperties);
+  parentDirToFile = getParentDirToFileMap(inpFs, fileStatuses);
+  statusList = new ArrayList<>(parentDirToFile.keySet());

Review Comment:
   Done.



##
ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java:
##
@@ -254,18 +286,26 @@ private void generateActualTasks(HiveConf conf, 
List> resTsks,
   long trgtSize, long avgConditionSize, Task mvTask,
   Task mrTask, Task mrAndMvTask, Path dirPath,
   FileSystem inpFs, ConditionalResolverMergeFilesCtx ctx, MapWork work, 
int dpLbLevel,
-  boolean manifestFilePresent)
-  throws IOException {
+  boolean manifestFilePresent, HiveStorageHandler storageHandler)
+  throws IOException, ClassNotFoundException {
 DynamicPartitionCtx dpCtx = ctx.getDPCtx();
 List statusList;
-Map> manifestDirToFile = new HashMap<>();
+Map> parentDirToFile = new HashMap<>();
+boolean useCustomStorageHandler = storageHandler != null && 
storageHandler.supportsMergeFiles();
+StorageHandlerMergeProperties mergeProperties = useCustomStorageHandler ?
+
storageHandler.getStorageHandlerMergeProperties(ctx.getCustomStorageHandlerProps())
 : null;
 if (manifestFilePresent) {
   // Get the list of files from manifest file.
   List fileStatuses = getManifestFilePaths(conf, dirPath);
   // Setup the work to include all the files present in the manifest.
   setupWorkWhenUsingManifestFile(work, fileStatuses, dirPath, false);
-  manifestDirToFile = getManifestDirs(inpFs, fileStatuses);
-  statusList = new ArrayList<>(manifestDirToFile.keySet());
+  parentDirToFile = getParentDirToFileMap(inpFs, fileStatuses);
+  statusList = new ArrayList<>(parentDirToFile.keySet());

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505606288


##
ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java:
##
@@ -125,6 +133,22 @@ public ListBucketingCtx getLbCtx() {
 public void setLbCtx(ListBucketingCtx lbCtx) {
   this.lbCtx = lbCtx;
 }
+
+public void setCustomStorageHandlerProps(Properties properties) {

Review Comment:
   Removed.



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -2021,4 +2023,34 @@ public List 
getPartitionsByExpr(org.apache.hadoop.hive.ql.metadata.Ta
   throw new SemanticException(String.format("Error while fetching the 
partitions due to: %s", e));
 }
   }
+
+  @Override
+  public boolean supportsMergeFiles() {
+return true;
+  }
+
+  @Override
+  public List getMergeInputFiles(Properties properties) throws 
IOException {
+String tableName = properties.getProperty(Catalogs.NAME);
+String snapshotRef = properties.getProperty(Catalogs.SNAPSHOT_REF);
+Configuration configuration = SessionState.getSessionConf();
+List originalContextList = generateJobContext(configuration, 
tableName, snapshotRef);
+List jobContextList = originalContextList.stream()
+.map(TezUtil::enrichContextWithVertexId)
+.collect(Collectors.toList());
+if (originalContextList.isEmpty()) {
+  return Collections.emptyList();
+}
+List dataFiles = Lists.newArrayList();
+for (DataFile dataFile : new 
HiveIcebergOutputCommitter().getWrittenFiles(jobContextList)) {
+  FileSystem fs = new Path(dataFile.path().toString()).getFileSystem(conf);
+  dataFiles.add(fs.getFileStatus(new Path(dataFile.path().toString(;
+}
+return dataFiles;
+  }
+
+  @Override
+  public StorageHandlerMergeProperties 
getStorageHandlerMergeProperties(Properties properties) {

Review Comment:
   Renamed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505604310


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java:
##
@@ -747,4 +750,18 @@ default List 
getPartitionsByExpr(org.apache.hadoop.hive.ql.metadata.T
 throw new UnsupportedOperationException("Storage handler does not support 
getting partitions by expression " +
 "for a table.");
   }
+
+  default boolean supportsMergeFiles() {
+return false;
+  }
+
+  default List getMergeInputFiles(Properties properties) throws 
IOException {
+throw new UnsupportedOperationException("Storage handler does not support 
getting merge input files " +
+"for a table.");
+  }
+
+  default StorageHandlerMergeProperties 
getStorageHandlerMergeProperties(Properties properties) {

Review Comment:
   Renamed it to `getMergeProperties`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505603811


##
ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java:
##
@@ -691,6 +693,18 @@ Map> removeScheme(Map> pathToAliases) {
   public RecordReader getRecordReader(InputSplit split, JobConf job,
   Reporter reporter) throws IOException {
 if (!(split instanceof CombineHiveInputSplit)) {
+  if (split instanceof FileSplit) {

Review Comment:
   If the conditions are satisfied then we use the `inputFormat` for getting 
the record reader. Otherwise we do fallback as previously done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28084: Iceberg: COW fix for Merge operation [hive]

2024-02-28 Thread via GitHub


deniskuzZ merged PR #5088:
URL: https://github.com/apache/hive/pull/5088


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505602500


##
ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java:
##
@@ -691,6 +693,18 @@ Map> removeScheme(Map> pathToAliases) {
   public RecordReader getRecordReader(InputSplit split, JobConf job,
   Reporter reporter) throws IOException {
 if (!(split instanceof CombineHiveInputSplit)) {
+  if (split instanceof FileSplit) {
+Map pathToPartitionInfo = 
Utilities.getMapWork(job).getPathToPartitionInfo();
+Path path = ((FileSplit) split).getPath();
+PartitionDesc partitionDesc = 
HiveFileFormatUtils.getFromPathRecursively(pathToPartitionInfo, path,
+IOPrepareCache.get().getPartitionDescMap());
+boolean useDefaultFileFormat = partitionDesc.getInputFileFormatClass()
+
.equals(partitionDesc.getTableDesc().getInputFileFormatClass());

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505602116


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergStorageHandlerMergeProperties.java:
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.util.Properties;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat;
+import org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat;
+import org.apache.hadoop.hive.ql.io.orc.OrcInputFormat;
+import org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat;
+import org.apache.hadoop.hive.ql.io.orc.OrcSerde;
+import org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat;
+import org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat;
+import org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe;
+import org.apache.hadoop.hive.ql.plan.StorageHandlerMergeProperties;
+import org.apache.hadoop.hive.serde2.avro.AvroSerDe;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.TableProperties;
+import org.apache.iceberg.mr.Catalogs;
+
+public class IcebergStorageHandlerMergeProperties implements 
StorageHandlerMergeProperties {

Review Comment:
   Thanks for the info. Added code to use it now.



##
ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java:
##
@@ -370,7 +371,8 @@ private InputSplit[] getCombineSplits(JobConf job, int 
numSplits,
   PartitionDesc part = HiveFileFormatUtils.getFromPathRecursively(
   pathToPartitionInfo, path, 
IOPrepareCache.get().allocatePartitionDescMap());
   TableDesc tableDesc = part.getTableDesc();
-  if ((tableDesc != null) && tableDesc.isNonNative()) {
+  boolean useDefaultFileFormat = 
part.getInputFileFormatClass().equals(tableDesc.getInputFileFormatClass());
+  if ((tableDesc != null) && tableDesc.isNonNative() && 
useDefaultFileFormat) {

Review Comment:
   Removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505601391


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergStorageHandlerMergeProperties.java:
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.util.Properties;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat;
+import org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat;
+import org.apache.hadoop.hive.ql.io.orc.OrcInputFormat;
+import org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat;
+import org.apache.hadoop.hive.ql.io.orc.OrcSerde;
+import org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat;
+import org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat;
+import org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe;
+import org.apache.hadoop.hive.ql.plan.StorageHandlerMergeProperties;
+import org.apache.hadoop.hive.serde2.avro.AvroSerDe;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.TableProperties;
+import org.apache.iceberg.mr.Catalogs;
+
+public class IcebergStorageHandlerMergeProperties implements 
StorageHandlerMergeProperties {
+
+  private final Properties properties;
+
+  IcebergStorageHandlerMergeProperties(Properties properties) {
+this.properties = properties;
+  }
+
+  public Path getTmpLocation() {
+String location = properties.getProperty(Catalogs.LOCATION);
+return new Path(location + "/data/");
+  }
+
+  public String getInputFileFormatClassName() {
+FileFormat fileFormat = 
FileFormat.fromString(properties.getProperty(TableProperties.DEFAULT_FILE_FORMAT));
+if (fileFormat == FileFormat.ORC) {

Review Comment:
   Using `StorageFormatDescriptor` in the `StorageHandlerMergeProperties` 
itself. Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505599549


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -2021,4 +2023,34 @@ public List 
getPartitionsByExpr(org.apache.hadoop.hive.ql.metadata.Ta
   throw new SemanticException(String.format("Error while fetching the 
partitions due to: %s", e));
 }
   }
+
+  @Override
+  public boolean supportsMergeFiles() {
+return true;
+  }
+
+  @Override
+  public List getMergeInputFiles(Properties properties) throws 
IOException {
+String tableName = properties.getProperty(Catalogs.NAME);
+String snapshotRef = properties.getProperty(Catalogs.SNAPSHOT_REF);
+Configuration configuration = SessionState.getSessionConf();
+List originalContextList = generateJobContext(configuration, 
tableName, snapshotRef);
+List jobContextList = originalContextList.stream()
+.map(TezUtil::enrichContextWithVertexId)
+.collect(Collectors.toList());
+if (originalContextList.isEmpty()) {
+  return Collections.emptyList();
+}
+List dataFiles = Lists.newArrayList();
+for (DataFile dataFile : new 
HiveIcebergOutputCommitter().getWrittenFiles(jobContextList)) {

Review Comment:
   > why do you need new instance of HiveIcebergOutputCommitter, can't you make 
the method static? Maybe we should move it to Iceberg util class
   
   Kept it in the same place so that we could make use of existing helper 
functions already present in HiveIcebergOutputCommitter.
   
   > Why do you need to enrichContextWithVertexId again inside of 
getWrittenFiles?
   
   Removed and doing only once.
   
   >  Why don't we return List dataFiles right away, but add 1 more 
foreach with transformation?
   
   Added functionality to directly convert it into `Filestatus` objects.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505595228


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -720,4 +744,42 @@ private static FilesForCommit readFileForCommit(String 
fileForCommitLocation, Fi
   throw new NotFoundException("Can not read or parse committed file: %s", 
fileForCommitLocation);
 }
   }
+
+  public List getWrittenFiles(List jobContexts) throws 
IOException {
+List jobContextList = jobContexts.stream()
+.map(TezUtil::enrichContextWithVertexId)
+.collect(Collectors.toList());
+List outputs = collectOutputs(jobContextList);
+ExecutorService fileExecutor = 
fileExecutor(jobContextList.get(0).getJobConf());
+ExecutorService tableExecutor = 
tableExecutor(jobContextList.get(0).getJobConf(), outputs.size());
+Collection dataFiles = new ConcurrentLinkedQueue<>();
+try {
+  Tasks.foreach(outputs.stream().flatMap(kv -> kv.jobContexts.stream()
+  .map(jobContext -> new SimpleImmutableEntry<>(kv.table, 
jobContext
+  .suppressFailureWhenFinished()
+  .executeWith(tableExecutor)
+  .onFailure((output, exc) -> LOG.warn("Failed to retrieve merge 
input file for the table {}", output, exc))
+  .run(output -> {
+JobContext jobContext = output.getValue();
+JobConf jobConf = jobContext.getJobConf();
+LOG.info("Cleaning job for jobID: {}, table: {}", 
jobContext.getJobID(), output);
+
+Table table = output.getKey();
+String jobLocation = generateJobLocation(table.location(), 
jobConf, jobContext.getJobID());
+// list jobLocation to get number of forCommit files
+// we do this because map/reduce num in jobConf is unreliable
+// and we have no access to vertex status info
+int numTasks = listForCommits(jobConf, jobLocation).size();
+FilesForCommit results = collectResults(numTasks, 
fileExecutor, table.location(), jobContext,
+table.io(), false);
+dataFiles.addAll(results.dataFiles());
+  }, IOException.class);
+} finally {
+  fileExecutor.shutdown();
+  if (tableExecutor != null) {

Review Comment:
   `fileExecutor` is always created, whereas `tableExecutor` can be null.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505593646


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java:
##
@@ -178,7 +178,7 @@ public RecordReader> 
getRecordReader(InputSplit split, J
 
   @Override
   public boolean shouldSkipCombine(Path path, Configuration conf) {
-return true;
+return false;

Review Comment:
   This function is used during generation of splits in CombineRecordReader 
(during execution of merge tasks). It is a flag function which tells whether 
merging is supported by the file format.
   `hive.merge.tezfiles=false` will still work to disable merge functionality.



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -163,6 +167,26 @@ public void commitTask(TaskAttemptContext originalContext) 
throws IOException {
   LOG.info("CommitTask found no serialized table in config for 
table: {}.", output);
 }
   }, IOException.class);
+
+  // Merge task has merged several files into one. Hence we need to remove 
the stale files.

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505593646


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java:
##
@@ -178,7 +178,7 @@ public RecordReader> 
getRecordReader(InputSplit split, J
 
   @Override
   public boolean shouldSkipCombine(Path path, Configuration conf) {
-return true;
+return false;

Review Comment:
   This function is used during generation of splits in CombineRecordReader 
(during execution of merge tasks). It is a flag function which tells whether 
merging is supported.
   `hive.merge.tezfiles=false` will still work to disable merge functionality.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505593646


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java:
##
@@ -178,7 +178,7 @@ public RecordReader> 
getRecordReader(InputSplit split, J
 
   @Override
   public boolean shouldSkipCombine(Path path, Configuration conf) {
-return true;
+return false;

Review Comment:
   This function is used during generation of splits in CombineRecordReader 
(during generation of merge tasks). It is a flag function which tells whether 
merging is supported.
   `hive.merge.tezfiles=false` will still work to disable merge functionality.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505590111


##
ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java:
##
@@ -512,7 +555,30 @@ private void setupWorkWhenUsingManifestFile(MapWork 
mapWork, List fi
 mapWork.setUseInputPathsDirectly(true);
   }
 
-  private Map> getManifestDirs(FileSystem inpFs, 
List fileStatuses)
+  private void setupWorkWhenUsingCustomHandler(MapWork mapWork, Path dirPath,
+   StorageHandlerMergeProperties 
mergeProperties) throws ClassNotFoundException {
+Map> aliasToWork = 
mapWork.getAliasToWork();
+Map pathToPartitionInfo = 
mapWork.getPathToPartitionInfo();
+Operator op = aliasToWork.get(dirPath.toString());
+PartitionDesc partitionDesc = pathToPartitionInfo.get(dirPath);
+Path tmpDir = mergeProperties.getTmpLocation();

Review Comment:
   `mergeProperties` is present when its used under storage handler which 
supports it. The boolean `customStorageHandler` if true should make sure that 
`mergeProperties`is not null.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505587606


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergStorageHandlerMergeProperties.java:
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.util.Properties;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat;
+import org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat;
+import org.apache.hadoop.hive.ql.io.orc.OrcInputFormat;
+import org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat;
+import org.apache.hadoop.hive.ql.io.orc.OrcSerde;
+import org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat;
+import org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat;
+import org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe;
+import org.apache.hadoop.hive.ql.plan.StorageHandlerMergeProperties;
+import org.apache.hadoop.hive.serde2.avro.AvroSerDe;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.TableProperties;
+import org.apache.iceberg.mr.Catalogs;
+
+public class IcebergStorageHandlerMergeProperties implements 
StorageHandlerMergeProperties {
+
+  private final Properties properties;
+
+  IcebergStorageHandlerMergeProperties(Properties properties) {
+this.properties = properties;
+  }
+
+  public Path getTmpLocation() {
+String location = properties.getProperty(Catalogs.LOCATION);
+return new Path(location + "/data/");
+  }
+
+  public String getInputFileFormatClassName() {
+FileFormat fileFormat = 
FileFormat.fromString(properties.getProperty(TableProperties.DEFAULT_FILE_FORMAT));
+if (fileFormat == FileFormat.ORC) {
+  return OrcInputFormat.class.getName();
+} else if (fileFormat == FileFormat.PARQUET) {
+  return MapredParquetInputFormat.class.getName();
+} else if (fileFormat == FileFormat.AVRO) {
+  return AvroContainerInputFormat.class.getName();
+}
+return null;
+  }
+
+  public String getOutputFileFormatClassName() {
+FileFormat fileFormat = 
FileFormat.fromString(properties.getProperty(TableProperties.DEFAULT_FILE_FORMAT));
+if (fileFormat == FileFormat.ORC) {
+  return OrcOutputFormat.class.getName();
+} else if (fileFormat == FileFormat.PARQUET) {
+  return MapredParquetOutputFormat.class.getName();
+} else if (fileFormat == FileFormat.AVRO) {
+  return AvroContainerOutputFormat.class.getName();
+}
+return null;
+  }
+
+  public String getFileSerdeClassName() {
+FileFormat fileFormat = 
FileFormat.fromString(properties.getProperty(TableProperties.DEFAULT_FILE_FORMAT));
+if (fileFormat == FileFormat.ORC) {
+  return OrcSerde.class.getName();
+} else if (fileFormat == FileFormat.PARQUET) {
+  return ParquetHiveSerDe.class.getName();
+} else if (fileFormat == FileFormat.AVRO) {
+  return AvroSerDe.class.getName();
+}
+return null;
+  }

Review Comment:
   Replaced with existing StorageFormatDescriptor. Done.



##
ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java:
##
@@ -527,4 +593,20 @@ private Map> 
getManifestDirs(FileSystem inpFs, List
 }
 return manifestDirsToPaths;
   }
+
+  private void setMergePropertiesToPartDesc(PartitionDesc partitionDesc,
+StorageHandlerMergeProperties 
mergeProperties) throws ClassNotFoundException{
+String inputFileFormatClassName = 
mergeProperties.getInputFileFormatClassName();
+String outputFileFormatClassName = 
mergeProperties.getOutputFileFormatClassName();
+String serdeClassName = mergeProperties.getFileSerdeClassName();

Review Comment:
   Added a null check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, 

Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505587079


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -163,6 +167,26 @@ public void commitTask(TaskAttemptContext originalContext) 
throws IOException {
   LOG.info("CommitTask found no serialized table in config for 
table: {}.", output);
 }
   }, IOException.class);
+
+  // Merge task has merged several files into one. Hence we need to remove 
the stale files.
+  // At this stage the file is written and task-committed, but the old 
files are still present.
+  if 
(CombineHiveInputFormat.class.equals(jobConf.getInputFormat().getClass())) {
+MapWork mrwork = Utilities.getMapWork(jobConf);
+if (mrwork != null) {
+  List mergedPaths = mrwork.getInputPaths();
+  if (CollectionUtils.isNotEmpty(mergedPaths)) {
+Tasks.foreach(mergedPaths)

Review Comment:
   Removed empty collection check & replaced with null check which is 
necessary. Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]

2024-02-28 Thread via GitHub


SourabhBadhya commented on code in PR #5076:
URL: https://github.com/apache/hive/pull/5076#discussion_r1505586021


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -163,6 +167,26 @@ public void commitTask(TaskAttemptContext originalContext) 
throws IOException {
   LOG.info("CommitTask found no serialized table in config for 
table: {}.", output);
 }
   }, IOException.class);
+
+  // Merge task has merged several files into one. Hence we need to remove 
the stale files.
+  // At this stage the file is written and task-committed, but the old 
files are still present.
+  if 
(CombineHiveInputFormat.class.equals(jobConf.getInputFormat().getClass())) {

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org