Re: [PR] HIVE-12371: Use DriverManager.getLoginTimeout() as fallback [hive]
jneira-stratio commented on PR #4964: URL: https://github.com/apache/hive/pull/4964#issuecomment-1970553683 > there are some 180+ test failures & all looks related They surely might be. However i cant see how the change _standalone_ could cause them as it is restoring a previous behaviour (prev to #1611) and DriverManager.loginTimeout default value is 0 afaics (the previous fixed default value). Sorry i dont have time to take a deeper look, feel free to close it (well it will be closed auto anyway i guess). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-27775: DirectSQL and JDO results are different when fetching par… [hive]
sonarcloud[bot] commented on PR #4959: URL: https://github.com/apache/hive/pull/4959#issuecomment-1970288996 ## [![Quality Gate Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png 'Quality Gate Passed')](https://sonarcloud.io/dashboard?id=apache_hive=4959) **Quality Gate passed** Issues ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [13 New issues](https://sonarcloud.io/project/issues?id=apache_hive=4959=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png '') [0 Accepted issues](https://sonarcloud.io/component_measures?id=apache_hive=4959=new_accepted_issues=list) Measures ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4959=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Coverage ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Duplication [See analysis details on SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=4959) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-27775: DirectSQL and JDO results are different when fetching par… [hive]
dengzhhu653 commented on code in PR #4959: URL: https://github.com/apache/hive/pull/4959#discussion_r1506862344 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java: ## @@ -1424,30 +1422,36 @@ public void visit(LeafNode node) throws MetaException { return; } + String nodeValue0 = "?"; // if Filter.g does date parsing for quoted strings, we'd need to verify there's no // type mismatch when string col is filtered by a string that looks like date. - if (colType == FilterType.Date && valType == FilterType.String) { -// Filter.g cannot parse a quoted date; try to parse date here too. + if (colType == FilterType.Date) { try { - nodeValue = MetaStoreUtils.convertStringToDate((String)nodeValue); + // check the nodeValue is a valid date + nodeValue = MetaStoreUtils.convertDateToString( Review Comment: This is for converting the string date to a timezone based string date, we need it on the direct sql: https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java#L263-L269 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-12371: Use DriverManager.getLoginTimeout() as fallback [hive]
github-actions[bot] commented on PR #4964: URL: https://github.com/apache/hive/pull/4964#issuecomment-1970152782 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] Update Hive version in Docker README [hive]
sonarcloud[bot] commented on PR #5105: URL: https://github.com/apache/hive/pull/5105#issuecomment-1969803673 ## [![Quality Gate Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png 'Quality Gate Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5105) **Quality Gate passed** Issues ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [0 New issues](https://sonarcloud.io/project/issues?id=apache_hive=5105=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png '') [0 Accepted issues](https://sonarcloud.io/component_measures?id=apache_hive=5105=new_accepted_issues=list) Measures ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5105=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Coverage ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Duplication [See analysis details on SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5105) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[PR] Update Hive version in Docker README [hive]
mertyyanik opened a new pull request, #5105: URL: https://github.com/apache/hive/pull/5105 ### What changes were proposed in this pull request? While the quickstart page for Docker was up to date, the readme was not. I added the updates on the page to the readme. Quicstart page: https://hive.apache.org/developement/quickstart/ ### Does this PR introduce _any_ user-facing change? It allows the user to see the current readme page. ### Is the change a dependency upgrade? No. ### How was this patch tested? Since it only covers changes to the readme, I confirmed that the readme reads properly on markdown. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28093: Re-execute DAG in case of NoCurrentDAGException [hive]
sonarcloud[bot] commented on PR #5103: URL: https://github.com/apache/hive/pull/5103#issuecomment-1969626662 ## [![Quality Gate Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png 'Quality Gate Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5103) **Quality Gate passed** Issues ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [2 New issues](https://sonarcloud.io/project/issues?id=apache_hive=5103=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png '') [0 Accepted issues](https://sonarcloud.io/component_measures?id=apache_hive=5103=new_accepted_issues=list) Measures ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [1 Security Hotspot](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5103=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Coverage ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Duplication [See analysis details on SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5103) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-27775: DirectSQL and JDO results are different when fetching par… [hive]
deniskuzZ commented on code in PR #4959: URL: https://github.com/apache/hive/pull/4959#discussion_r1506164080 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java: ## @@ -1505,14 +1507,9 @@ public void visit(LeafNode node) throws MetaException { params.add(catName.toLowerCase()); } tableValue += " then " + tableValue0 + " else null end)"; - -if (valType == FilterType.Date) { - tableValue = dbType.toDate(tableValue); -} else if (valType == FilterType.Timestamp) { - tableValue = dbType.toTimestamp(tableValue); -} } - if (!node.isReverseOrder) { + + if (!node.isReverseOrder && nodeValue != null) { Review Comment: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-27775: DirectSQL and JDO results are different when fetching par… [hive]
deniskuzZ commented on code in PR #4959: URL: https://github.com/apache/hive/pull/4959#discussion_r1506152341 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java: ## @@ -1424,30 +1422,36 @@ public void visit(LeafNode node) throws MetaException { return; } + String nodeValue0 = "?"; // if Filter.g does date parsing for quoted strings, we'd need to verify there's no // type mismatch when string col is filtered by a string that looks like date. - if (colType == FilterType.Date && valType == FilterType.String) { -// Filter.g cannot parse a quoted date; try to parse date here too. + if (colType == FilterType.Date) { try { - nodeValue = MetaStoreUtils.convertStringToDate((String)nodeValue); + // check the nodeValue is a valid date + nodeValue = MetaStoreUtils.convertDateToString( Review Comment: PartFilterVisitor returns now String date representation after validation, isn't it? the question wasn't about changing the returned type, but why we need these 2 conversions? nodeValue = MetaStoreUtils.convertDateToString(MetaStoreUtils.convertStringToDate((String) nodeValue)); if we really need them for some reason could extract that logic into `MetaStoreUtils.normalizeDate()` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-27775: DirectSQL and JDO results are different when fetching par… [hive]
deniskuzZ commented on code in PR #4959: URL: https://github.com/apache/hive/pull/4959#discussion_r1506152341 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java: ## @@ -1424,30 +1422,36 @@ public void visit(LeafNode node) throws MetaException { return; } + String nodeValue0 = "?"; // if Filter.g does date parsing for quoted strings, we'd need to verify there's no // type mismatch when string col is filtered by a string that looks like date. - if (colType == FilterType.Date && valType == FilterType.String) { -// Filter.g cannot parse a quoted date; try to parse date here too. + if (colType == FilterType.Date) { try { - nodeValue = MetaStoreUtils.convertStringToDate((String)nodeValue); + // check the nodeValue is a valid date + nodeValue = MetaStoreUtils.convertDateToString( Review Comment: PartFilterVisitor returns now String date representation after validation, isn't it? the question wasn't about changing the returned type, but why we need these 2 conversions? nodeValue = nodeValue = MetaStoreUtils.convertDateToString(MetaStoreUtils.convertStringToDate((String) nodeValue)); if we really need them for some reason could extract that logic into `MetaStoreUtils.normalizeDate()` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-27775: DirectSQL and JDO results are different when fetching par… [hive]
deniskuzZ commented on code in PR #4959: URL: https://github.com/apache/hive/pull/4959#discussion_r1506152341 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java: ## @@ -1424,30 +1422,36 @@ public void visit(LeafNode node) throws MetaException { return; } + String nodeValue0 = "?"; // if Filter.g does date parsing for quoted strings, we'd need to verify there's no // type mismatch when string col is filtered by a string that looks like date. - if (colType == FilterType.Date && valType == FilterType.String) { -// Filter.g cannot parse a quoted date; try to parse date here too. + if (colType == FilterType.Date) { try { - nodeValue = MetaStoreUtils.convertStringToDate((String)nodeValue); + // check the nodeValue is a valid date + nodeValue = MetaStoreUtils.convertDateToString( Review Comment: PartFilterVisitor returns String date representation after validation, isn't it? ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java: ## @@ -1424,30 +1422,36 @@ public void visit(LeafNode node) throws MetaException { return; } + String nodeValue0 = "?"; // if Filter.g does date parsing for quoted strings, we'd need to verify there's no // type mismatch when string col is filtered by a string that looks like date. - if (colType == FilterType.Date && valType == FilterType.String) { -// Filter.g cannot parse a quoted date; try to parse date here too. + if (colType == FilterType.Date) { try { - nodeValue = MetaStoreUtils.convertStringToDate((String)nodeValue); + // check the nodeValue is a valid date + nodeValue = MetaStoreUtils.convertDateToString( Review Comment: PartFilterVisitor returns now String date representation after validation, isn't it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
sonarcloud[bot] commented on PR #5076: URL: https://github.com/apache/hive/pull/5076#issuecomment-1969109963 ## [![Quality Gate Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png 'Quality Gate Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5076) **Quality Gate passed** Issues ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [5 New issues](https://sonarcloud.io/project/issues?id=apache_hive=5076=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/accepted-16px.png '') [0 Accepted issues](https://sonarcloud.io/component_measures?id=apache_hive=5076=new_accepted_issues=list) Measures ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5076=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Coverage ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Duplication [See analysis details on SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5076) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28093: Re-execute DAG in case of NoCurrentDAGException [hive]
abstractdog commented on PR #5103: URL: https://github.com/apache/hive/pull/5103#issuecomment-1968997117 > Minor Stuff, else looks good thanks a lot, addressed your comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-25972: HIVE_VECTORIZATION_USE_ROW_DESERIALIZE in hiveconf.java imply default value is false,in fact the default value is 'true' [hive]
SourabhBadhya merged PR #3824: URL: https://github.com/apache/hive/pull/3824 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28093: Re-execute DAG in case of NoCurrentDAGException [hive]
abstractdog commented on code in PR #5103: URL: https://github.com/apache/hive/pull/5103#discussion_r1505911098 ## ql/src/test/org/apache/hadoop/hive/ql/reexec/TestReExecuteLostAMQueryPlugin.java: ## @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.reexec; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.QueryState; +import org.apache.hadoop.hive.ql.exec.tez.TezRuntimeException; +import org.apache.hadoop.hive.ql.hooks.HookContext; +import org.junit.Assert; +import org.junit.Test; + +public class TestReExecuteLostAMQueryPlugin { + + @Test + public void testRetryOnUnmanagedAmFailure() throws Exception { +ReExecuteLostAMQueryPlugin plugin = new ReExecuteLostAMQueryPlugin(); +ReExecuteLostAMQueryPlugin.LocalHook hook = plugin.new LocalHook(); + +HookContext context = new HookContext(null, QueryState.getNewQueryState(new HiveConf(), null), null, null, null, +null, null, null, null, false, null, null); +context.setHookType(HookContext.HookType.ON_FAILURE_HOOK); +context.setException(new TezRuntimeException("dag_0_0", "AM record not found (likely died)")); + +hook.run(context); + +Assert.assertEquals(true, plugin.shouldReExecute(1)); + } + + @Test + public void testRetryOnNoCurrentDAGException() throws Exception { Review Comment: ack -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28093: Re-execute DAG in case of NoCurrentDAGException [hive]
abstractdog commented on code in PR #5103: URL: https://github.com/apache/hive/pull/5103#discussion_r1505909179 ## ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecuteLostAMQueryPlugin.java: ## @@ -44,15 +50,28 @@ public void run(HookContext hookContext) throws Exception { if (hookContext.getHookType() == HookContext.HookType.ON_FAILURE_HOOK) { Throwable exception = hookContext.getException(); -if (exception != null && exception.getMessage() != null) { +if (!(exception instanceof TezRuntimeException)) { + LOG.info("Exception is not a TezRuntimeException, no need to check further with ReExecuteLostAMQueryPlugin"); + return; +} + +TezRuntimeException tre = (TezRuntimeException)exception; + +if (tre != null && tre.getMessage() != null) { + dagIds.add(tre.getDagId()); // When HS2 does not manage the AMs, tez AMs are registered with zookeeper and HS2 discovers it, // failure of unmanaged AMs will throw AM record not being found in zookeeper. String unmanagedAMFailure = "AM record not found (likely died)"; - if (lostAMContainerErrorPattern.matcher(exception.getMessage()).matches() - || exception.getMessage().contains(unmanagedAMFailure)) { + // DAG lost in the scenario described at TEZ-4543 + String dagLostFailure = "No running DAG at present"; + + if (lostAMContainerErrorPattern.matcher(tre.getMessage()).matches() + || tre.getMessage().contains(unmanagedAMFailure) + || tre.getMessage().contains(dagLostFailure)) { retryPossible = true; } - LOG.info("Got exception message: {} retryPossible: {}", exception.getMessage(), retryPossible); + LOG.info("Got exception message: {} retryPossible: {}, dags seen so far: {}", tre.getMessage(), retryPossible, Review Comment: ack -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28093: Re-execute DAG in case of NoCurrentDAGException [hive]
abstractdog commented on code in PR #5103: URL: https://github.com/apache/hive/pull/5103#discussion_r1505907827 ## ql/src/test/org/apache/hadoop/hive/ql/reexec/TestReExecuteLostAMQueryPlugin.java: ## @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.reexec; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.QueryState; +import org.apache.hadoop.hive.ql.exec.tez.TezRuntimeException; +import org.apache.hadoop.hive.ql.hooks.HookContext; +import org.junit.Assert; +import org.junit.Test; + +public class TestReExecuteLostAMQueryPlugin { + + @Test + public void testRetryOnUnmanagedAmFailure() throws Exception { +ReExecuteLostAMQueryPlugin plugin = new ReExecuteLostAMQueryPlugin(); +ReExecuteLostAMQueryPlugin.LocalHook hook = plugin.new LocalHook(); + +HookContext context = new HookContext(null, QueryState.getNewQueryState(new HiveConf(), null), null, null, null, +null, null, null, null, false, null, null); +context.setHookType(HookContext.HookType.ON_FAILURE_HOOK); +context.setException(new TezRuntimeException("dag_0_0", "AM record not found (likely died)")); + +hook.run(context); + +Assert.assertEquals(true, plugin.shouldReExecute(1)); Review Comment: ack ## ql/src/test/org/apache/hadoop/hive/ql/reexec/TestReExecuteLostAMQueryPlugin.java: ## @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.reexec; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.QueryState; +import org.apache.hadoop.hive.ql.exec.tez.TezRuntimeException; +import org.apache.hadoop.hive.ql.hooks.HookContext; +import org.junit.Assert; +import org.junit.Test; + +public class TestReExecuteLostAMQueryPlugin { + + @Test + public void testRetryOnUnmanagedAmFailure() throws Exception { +ReExecuteLostAMQueryPlugin plugin = new ReExecuteLostAMQueryPlugin(); +ReExecuteLostAMQueryPlugin.LocalHook hook = plugin.new LocalHook(); + +HookContext context = new HookContext(null, QueryState.getNewQueryState(new HiveConf(), null), null, null, null, +null, null, null, null, false, null, null); +context.setHookType(HookContext.HookType.ON_FAILURE_HOOK); +context.setException(new TezRuntimeException("dag_0_0", "AM record not found (likely died)")); + +hook.run(context); + +Assert.assertEquals(true, plugin.shouldReExecute(1)); + } + + @Test + public void testRetryOnNoCurrentDAGException() throws Exception { +ReExecuteLostAMQueryPlugin plugin = new ReExecuteLostAMQueryPlugin(); +ReExecuteLostAMQueryPlugin.LocalHook hook = plugin.new LocalHook(); + +HookContext context = new HookContext(null, QueryState.getNewQueryState(new HiveConf(), null), null, null, null, +null, null, null, null, false, null, null); +context.setHookType(HookContext.HookType.ON_FAILURE_HOOK); +context.setException(new TezRuntimeException("dag_0_0", "No running DAG at present")); + +hook.run(context); + +Assert.assertEquals(true, plugin.shouldReExecute(1)); Review Comment: ack -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact
Re: [PR] HIVE-28093: Re-execute DAG in case of NoCurrentDAGException [hive]
abstractdog commented on code in PR #5103: URL: https://github.com/apache/hive/pull/5103#discussion_r1505906262 ## ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecuteLostAMQueryPlugin.java: ## @@ -44,15 +50,28 @@ public void run(HookContext hookContext) throws Exception { if (hookContext.getHookType() == HookContext.HookType.ON_FAILURE_HOOK) { Throwable exception = hookContext.getException(); -if (exception != null && exception.getMessage() != null) { +if (!(exception instanceof TezRuntimeException)) { + LOG.info("Exception is not a TezRuntimeException, no need to check further with ReExecuteLostAMQueryPlugin"); + return; +} + +TezRuntimeException tre = (TezRuntimeException)exception; + +if (tre != null && tre.getMessage() != null) { Review Comment: ack ## ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecuteLostAMQueryPlugin.java: ## @@ -33,6 +36,9 @@ public class ReExecuteLostAMQueryPlugin implements IReExecutionPlugin { private static final Logger LOG = LoggerFactory.getLogger(ReExecuteLostAMQueryPlugin.class); private boolean retryPossible; + // a list to track DAG ids seen by this re-execution plugin during the same query + // it can help a lot with identifying the previous DAGs in case of retries + private List dagIds = new ArrayList<>(); Review Comment: ack -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28093: Re-execute DAG in case of NoCurrentDAGException [hive]
abstractdog commented on code in PR #5103: URL: https://github.com/apache/hive/pull/5103#discussion_r1505901461 ## ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecuteLostAMQueryPlugin.java: ## @@ -44,15 +50,28 @@ public void run(HookContext hookContext) throws Exception { if (hookContext.getHookType() == HookContext.HookType.ON_FAILURE_HOOK) { Throwable exception = hookContext.getException(); -if (exception != null && exception.getMessage() != null) { +if (!(exception instanceof TezRuntimeException)) { + LOG.info("Exception is not a TezRuntimeException, no need to check further with ReExecuteLostAMQueryPlugin"); + return; +} + +TezRuntimeException tre = (TezRuntimeException)exception; + +if (tre != null && tre.getMessage() != null) { + dagIds.add(tre.getDagId()); // When HS2 does not manage the AMs, tez AMs are registered with zookeeper and HS2 discovers it, // failure of unmanaged AMs will throw AM record not being found in zookeeper. String unmanagedAMFailure = "AM record not found (likely died)"; - if (lostAMContainerErrorPattern.matcher(exception.getMessage()).matches() - || exception.getMessage().contains(unmanagedAMFailure)) { + // DAG lost in the scenario described at TEZ-4543 + String dagLostFailure = "No running DAG at present"; Review Comment: yes, absolutely -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28083: Enable HMS client cache and HMS query cache for Explain p… [hive]
zabetak closed pull request #5092: HIVE-28083: Enable HMS client cache and HMS query cache for Explain p… URL: https://github.com/apache/hive/pull/5092 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28094: Improve HMS client cache and query cache performance for … [hive]
sonarcloud[bot] commented on PR #5102: URL: https://github.com/apache/hive/pull/5102#issuecomment-1968732069 ## [![Quality Gate Passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/qg-passed-20px.png 'Quality Gate Passed')](https://sonarcloud.io/dashboard?id=apache_hive=5102) **Quality Gate passed** Issues ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [9 New issues](https://sonarcloud.io/project/issues?id=apache_hive=5102=false=true) Measures ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/passed-16px.png '') [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=5102=false=true) ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Coverage ![](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/no-data-16px.png '') No data about Duplication [See analysis details on SonarCloud](https://sonarcloud.io/dashboard?id=apache_hive=5102) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-27848 - TestCrudCompactorOnTez.secondCompactionShouldBeRefusedBe… [hive]
zabetak commented on code in PR #4859: URL: https://github.com/apache/hive/pull/4859#discussion_r1505717881 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java: ## @@ -295,4 +324,230 @@ public static LockRequest createLockRequest(HiveConf conf, CompactionInfo ci, lo !conf.getBoolVar(HiveConf.ConfVars.TXN_WRITE_X_LOCK)); return requestBuilder.build(); } + + private static CompactionResponse requestCompaction(CompactionInfo ci, String runAs, String hostname, + String runtimeVersion, TxnStore txnHandler) throws MetaException { +CompactionRequest compactionRequest = new CompactionRequest(ci.dbname, ci.tableName, ci.type); +if (ci.partName != null) + compactionRequest.setPartitionname(ci.partName); +compactionRequest.setRunas(runAs); +if (StringUtils.isEmpty(ci.initiatorId)) { + compactionRequest.setInitiatorId(hostname + "-" + Thread.currentThread().getId()); +} else { + compactionRequest.setInitiatorId(ci.initiatorId); +} +compactionRequest.setInitiatorVersion(runtimeVersion); +compactionRequest.setPoolName(ci.poolName); +LOG.info("Requesting compaction: " + compactionRequest); +CompactionResponse resp = txnHandler.compact(compactionRequest); +if (resp.isAccepted()) { + ci.id = resp.getId(); +} +return resp; + } + + private static CompactionType determineCompactionType(CompactionInfo ci, AcidDirectory dir, + Map tblProperties, long baseSize, long deltaSize, HiveConf conf) { +boolean noBase = false; +List deltas = dir.getCurrentDirectories(); +if (baseSize == 0 && deltaSize > 0) { + noBase = true; +} else { + String deltaPctProp = + tblProperties.get(COMPACTOR_THRESHOLD_PREFIX + HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD); + float deltaPctThreshold = deltaPctProp == null ? HiveConf.getFloatVar(conf, + HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD) : Float.parseFloat(deltaPctProp); + boolean bigEnough = (float) deltaSize / (float) baseSize > deltaPctThreshold; + boolean multiBase = dir.getObsolete().stream().anyMatch(path -> path.getName().startsWith(AcidUtils.BASE_PREFIX)); + + boolean initiateMajor = bigEnough || (deltaSize == 0 && multiBase); + if (LOG.isDebugEnabled()) { +StringBuilder msg = new StringBuilder("delta size: "); +msg.append(deltaSize); +msg.append(" base size: "); +msg.append(baseSize); +msg.append(" multiBase "); +msg.append(multiBase); +msg.append(" deltaSize "); +msg.append(deltaSize); +msg.append(" threshold: "); +msg.append(deltaPctThreshold); +msg.append(" delta/base ratio > ").append(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD.varname) +.append(": "); +msg.append(bigEnough); +msg.append("."); +if (!initiateMajor) { + msg.append("not"); +} +msg.append(" initiating major compaction."); +LOG.debug(msg.toString()); + } + if (initiateMajor) +return CompactionType.MAJOR; +} + +String deltaNumProp = +tblProperties.get(COMPACTOR_THRESHOLD_PREFIX + HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_NUM_THRESHOLD); +int deltaNumThreshold = deltaNumProp == null ? HiveConf.getIntVar(conf, +HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_NUM_THRESHOLD) : Integer.parseInt(deltaNumProp); +boolean enough = deltas.size() > deltaNumThreshold; +if (!enough) { + LOG.debug( + "Not enough deltas to initiate compaction for table=" + ci.tableName + "partition=" + ci.partName + + ". Found: " + deltas.size() + " deltas, threshold is " + deltaNumThreshold); + return null; +} +// If there's no base file, do a major compaction +LOG.debug("Found " + deltas.size() + " delta files, and " + (noBase ? "no" : "has") + " base," + "requesting " ++ (noBase ? "major" : "minor") + " compaction"); + +return noBase || !isMinorCompactionSupported(conf, tblProperties, +dir) ? CompactionType.MAJOR : CompactionType.MINOR; + } + + private static long getBaseSize(AcidDirectory dir) throws IOException { +long baseSize = 0; +if (dir.getBase() != null) { + baseSize = getDirSize(dir.getFs(), dir.getBase()); +} else { + for (HadoopShims.HdfsFileStatusWithId origStat : dir.getOriginalFiles()) { +baseSize += origStat.getFileStatus().getLen(); + } +} +return baseSize; + } + + private static long getDirSize(FileSystem fs, AcidUtils.ParsedDirectory dir) throws IOException { +return dir.getFiles(fs, Ref.from(false)).stream().map(HadoopShims.HdfsFileStatusWithId::getFileStatus) +.mapToLong(FileStatus::getLen).sum(); + } + + private static CompactionType checkForCompaction(final CompactionInfo ci, final ValidWriteIdList writeIds, + final StorageDescriptor sd, final Map
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505610108 ## ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java: ## @@ -370,7 +371,8 @@ private InputSplit[] getCombineSplits(JobConf job, int numSplits, PartitionDesc part = HiveFileFormatUtils.getFromPathRecursively( pathToPartitionInfo, path, IOPrepareCache.get().allocatePartitionDescMap()); TableDesc tableDesc = part.getTableDesc(); - if ((tableDesc != null) && tableDesc.isNonNative()) { + boolean useDefaultFileFormat = part.getInputFileFormatClass().equals(tableDesc.getInputFileFormatClass()); Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505609448 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java: ## @@ -163,6 +167,26 @@ public void commitTask(TaskAttemptContext originalContext) throws IOException { LOG.info("CommitTask found no serialized table in config for table: {}.", output); } }, IOException.class); + + // Merge task has merged several files into one. Hence we need to remove the stale files. + // At this stage the file is written and task-committed, but the old files are still present. + if (CombineHiveInputFormat.class.equals(jobConf.getInputFormat().getClass())) { +MapWork mrwork = Utilities.getMapWork(jobConf); +if (mrwork != null) { + List mergedPaths = mrwork.getInputPaths(); + if (CollectionUtils.isNotEmpty(mergedPaths)) { +Tasks.foreach(mergedPaths) +.retry(3) +.stopOnFailure() +.throwFailureWhenFinished() Review Comment: I think we should not fail. Removed `stopOnFailure` and `throwFailureWhenFinished`, so as to not fail upon failures. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505607838 ## ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java: ## @@ -1720,6 +1723,41 @@ protected static MoveWork mergeMovePaths(Path condInputPath, MoveWork linkedMove return newWork; } + private static void setCustomStorageHandlerPropertiesForMerge(ConditionalResolverMergeFilesCtx mrCtx, MoveWork work) { Review Comment: Renamed it to `setCustomStorageHandlerAndPropertiesForMerge`. ## ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java: ## @@ -1720,6 +1723,41 @@ protected static MoveWork mergeMovePaths(Path condInputPath, MoveWork linkedMove return newWork; } + private static void setCustomStorageHandlerPropertiesForMerge(ConditionalResolverMergeFilesCtx mrCtx, MoveWork work) { Review Comment: Renamed it to `setStorageHandlerAndPropertiesForMerge`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505607087 ## ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java: ## @@ -147,18 +171,23 @@ public List> getTasks(HiveConf conf, Object objCtx) { Path dirPath = new Path(dirName); FileSystem inpFs = dirPath.getFileSystem(conf); DynamicPartitionCtx dpCtx = ctx.getDPCtx(); + HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(conf, ctx.getStorageHandlerClass()); + boolean dirExists = inpFs.exists(dirPath); + boolean useCustomStorageHandler = storageHandler != null && storageHandler.supportsMergeFiles(); - if (inpFs.exists(dirPath)) { + MapWork work = null; + if (dirExists || useCustomStorageHandler) { Review Comment: Removed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505606712 ## ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java: ## @@ -254,18 +286,26 @@ private void generateActualTasks(HiveConf conf, List> resTsks, long trgtSize, long avgConditionSize, Task mvTask, Task mrTask, Task mrAndMvTask, Path dirPath, FileSystem inpFs, ConditionalResolverMergeFilesCtx ctx, MapWork work, int dpLbLevel, - boolean manifestFilePresent) - throws IOException { + boolean manifestFilePresent, HiveStorageHandler storageHandler) + throws IOException, ClassNotFoundException { DynamicPartitionCtx dpCtx = ctx.getDPCtx(); List statusList; -Map> manifestDirToFile = new HashMap<>(); +Map> parentDirToFile = new HashMap<>(); +boolean useCustomStorageHandler = storageHandler != null && storageHandler.supportsMergeFiles(); +StorageHandlerMergeProperties mergeProperties = useCustomStorageHandler ? + storageHandler.getStorageHandlerMergeProperties(ctx.getCustomStorageHandlerProps()) : null; if (manifestFilePresent) { // Get the list of files from manifest file. List fileStatuses = getManifestFilePaths(conf, dirPath); // Setup the work to include all the files present in the manifest. setupWorkWhenUsingManifestFile(work, fileStatuses, dirPath, false); - manifestDirToFile = getManifestDirs(inpFs, fileStatuses); - statusList = new ArrayList<>(manifestDirToFile.keySet()); + parentDirToFile = getParentDirToFileMap(inpFs, fileStatuses); + statusList = new ArrayList<>(parentDirToFile.keySet()); +} else if (useCustomStorageHandler) { + List fileStatuses = storageHandler.getMergeInputFiles(ctx.getCustomStorageHandlerProps()); + setupWorkWhenUsingCustomHandler(work, dirPath, mergeProperties); + parentDirToFile = getParentDirToFileMap(inpFs, fileStatuses); + statusList = new ArrayList<>(parentDirToFile.keySet()); Review Comment: Done. ## ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java: ## @@ -254,18 +286,26 @@ private void generateActualTasks(HiveConf conf, List> resTsks, long trgtSize, long avgConditionSize, Task mvTask, Task mrTask, Task mrAndMvTask, Path dirPath, FileSystem inpFs, ConditionalResolverMergeFilesCtx ctx, MapWork work, int dpLbLevel, - boolean manifestFilePresent) - throws IOException { + boolean manifestFilePresent, HiveStorageHandler storageHandler) + throws IOException, ClassNotFoundException { DynamicPartitionCtx dpCtx = ctx.getDPCtx(); List statusList; -Map> manifestDirToFile = new HashMap<>(); +Map> parentDirToFile = new HashMap<>(); +boolean useCustomStorageHandler = storageHandler != null && storageHandler.supportsMergeFiles(); +StorageHandlerMergeProperties mergeProperties = useCustomStorageHandler ? + storageHandler.getStorageHandlerMergeProperties(ctx.getCustomStorageHandlerProps()) : null; if (manifestFilePresent) { // Get the list of files from manifest file. List fileStatuses = getManifestFilePaths(conf, dirPath); // Setup the work to include all the files present in the manifest. setupWorkWhenUsingManifestFile(work, fileStatuses, dirPath, false); - manifestDirToFile = getManifestDirs(inpFs, fileStatuses); - statusList = new ArrayList<>(manifestDirToFile.keySet()); + parentDirToFile = getParentDirToFileMap(inpFs, fileStatuses); + statusList = new ArrayList<>(parentDirToFile.keySet()); Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505606288 ## ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java: ## @@ -125,6 +133,22 @@ public ListBucketingCtx getLbCtx() { public void setLbCtx(ListBucketingCtx lbCtx) { this.lbCtx = lbCtx; } + +public void setCustomStorageHandlerProps(Properties properties) { Review Comment: Removed. ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -2021,4 +2023,34 @@ public List getPartitionsByExpr(org.apache.hadoop.hive.ql.metadata.Ta throw new SemanticException(String.format("Error while fetching the partitions due to: %s", e)); } } + + @Override + public boolean supportsMergeFiles() { +return true; + } + + @Override + public List getMergeInputFiles(Properties properties) throws IOException { +String tableName = properties.getProperty(Catalogs.NAME); +String snapshotRef = properties.getProperty(Catalogs.SNAPSHOT_REF); +Configuration configuration = SessionState.getSessionConf(); +List originalContextList = generateJobContext(configuration, tableName, snapshotRef); +List jobContextList = originalContextList.stream() +.map(TezUtil::enrichContextWithVertexId) +.collect(Collectors.toList()); +if (originalContextList.isEmpty()) { + return Collections.emptyList(); +} +List dataFiles = Lists.newArrayList(); +for (DataFile dataFile : new HiveIcebergOutputCommitter().getWrittenFiles(jobContextList)) { + FileSystem fs = new Path(dataFile.path().toString()).getFileSystem(conf); + dataFiles.add(fs.getFileStatus(new Path(dataFile.path().toString(; +} +return dataFiles; + } + + @Override + public StorageHandlerMergeProperties getStorageHandlerMergeProperties(Properties properties) { Review Comment: Renamed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505604310 ## ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java: ## @@ -747,4 +750,18 @@ default List getPartitionsByExpr(org.apache.hadoop.hive.ql.metadata.T throw new UnsupportedOperationException("Storage handler does not support getting partitions by expression " + "for a table."); } + + default boolean supportsMergeFiles() { +return false; + } + + default List getMergeInputFiles(Properties properties) throws IOException { +throw new UnsupportedOperationException("Storage handler does not support getting merge input files " + +"for a table."); + } + + default StorageHandlerMergeProperties getStorageHandlerMergeProperties(Properties properties) { Review Comment: Renamed it to `getMergeProperties`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505603811 ## ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java: ## @@ -691,6 +693,18 @@ Map> removeScheme(Map> pathToAliases) { public RecordReader getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException { if (!(split instanceof CombineHiveInputSplit)) { + if (split instanceof FileSplit) { Review Comment: If the conditions are satisfied then we use the `inputFormat` for getting the record reader. Otherwise we do fallback as previously done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28084: Iceberg: COW fix for Merge operation [hive]
deniskuzZ merged PR #5088: URL: https://github.com/apache/hive/pull/5088 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505602500 ## ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java: ## @@ -691,6 +693,18 @@ Map> removeScheme(Map> pathToAliases) { public RecordReader getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException { if (!(split instanceof CombineHiveInputSplit)) { + if (split instanceof FileSplit) { +Map pathToPartitionInfo = Utilities.getMapWork(job).getPathToPartitionInfo(); +Path path = ((FileSplit) split).getPath(); +PartitionDesc partitionDesc = HiveFileFormatUtils.getFromPathRecursively(pathToPartitionInfo, path, +IOPrepareCache.get().getPartitionDescMap()); +boolean useDefaultFileFormat = partitionDesc.getInputFileFormatClass() + .equals(partitionDesc.getTableDesc().getInputFileFormatClass()); Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505602116 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergStorageHandlerMergeProperties.java: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.util.Properties; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat; +import org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat; +import org.apache.hadoop.hive.ql.io.orc.OrcInputFormat; +import org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat; +import org.apache.hadoop.hive.ql.io.orc.OrcSerde; +import org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat; +import org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat; +import org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe; +import org.apache.hadoop.hive.ql.plan.StorageHandlerMergeProperties; +import org.apache.hadoop.hive.serde2.avro.AvroSerDe; +import org.apache.iceberg.FileFormat; +import org.apache.iceberg.TableProperties; +import org.apache.iceberg.mr.Catalogs; + +public class IcebergStorageHandlerMergeProperties implements StorageHandlerMergeProperties { Review Comment: Thanks for the info. Added code to use it now. ## ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java: ## @@ -370,7 +371,8 @@ private InputSplit[] getCombineSplits(JobConf job, int numSplits, PartitionDesc part = HiveFileFormatUtils.getFromPathRecursively( pathToPartitionInfo, path, IOPrepareCache.get().allocatePartitionDescMap()); TableDesc tableDesc = part.getTableDesc(); - if ((tableDesc != null) && tableDesc.isNonNative()) { + boolean useDefaultFileFormat = part.getInputFileFormatClass().equals(tableDesc.getInputFileFormatClass()); + if ((tableDesc != null) && tableDesc.isNonNative() && useDefaultFileFormat) { Review Comment: Removed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505601391 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergStorageHandlerMergeProperties.java: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.util.Properties; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat; +import org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat; +import org.apache.hadoop.hive.ql.io.orc.OrcInputFormat; +import org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat; +import org.apache.hadoop.hive.ql.io.orc.OrcSerde; +import org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat; +import org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat; +import org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe; +import org.apache.hadoop.hive.ql.plan.StorageHandlerMergeProperties; +import org.apache.hadoop.hive.serde2.avro.AvroSerDe; +import org.apache.iceberg.FileFormat; +import org.apache.iceberg.TableProperties; +import org.apache.iceberg.mr.Catalogs; + +public class IcebergStorageHandlerMergeProperties implements StorageHandlerMergeProperties { + + private final Properties properties; + + IcebergStorageHandlerMergeProperties(Properties properties) { +this.properties = properties; + } + + public Path getTmpLocation() { +String location = properties.getProperty(Catalogs.LOCATION); +return new Path(location + "/data/"); + } + + public String getInputFileFormatClassName() { +FileFormat fileFormat = FileFormat.fromString(properties.getProperty(TableProperties.DEFAULT_FILE_FORMAT)); +if (fileFormat == FileFormat.ORC) { Review Comment: Using `StorageFormatDescriptor` in the `StorageHandlerMergeProperties` itself. Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505599549 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -2021,4 +2023,34 @@ public List getPartitionsByExpr(org.apache.hadoop.hive.ql.metadata.Ta throw new SemanticException(String.format("Error while fetching the partitions due to: %s", e)); } } + + @Override + public boolean supportsMergeFiles() { +return true; + } + + @Override + public List getMergeInputFiles(Properties properties) throws IOException { +String tableName = properties.getProperty(Catalogs.NAME); +String snapshotRef = properties.getProperty(Catalogs.SNAPSHOT_REF); +Configuration configuration = SessionState.getSessionConf(); +List originalContextList = generateJobContext(configuration, tableName, snapshotRef); +List jobContextList = originalContextList.stream() +.map(TezUtil::enrichContextWithVertexId) +.collect(Collectors.toList()); +if (originalContextList.isEmpty()) { + return Collections.emptyList(); +} +List dataFiles = Lists.newArrayList(); +for (DataFile dataFile : new HiveIcebergOutputCommitter().getWrittenFiles(jobContextList)) { Review Comment: > why do you need new instance of HiveIcebergOutputCommitter, can't you make the method static? Maybe we should move it to Iceberg util class Kept it in the same place so that we could make use of existing helper functions already present in HiveIcebergOutputCommitter. > Why do you need to enrichContextWithVertexId again inside of getWrittenFiles? Removed and doing only once. > Why don't we return List dataFiles right away, but add 1 more foreach with transformation? Added functionality to directly convert it into `Filestatus` objects. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505595228 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java: ## @@ -720,4 +744,42 @@ private static FilesForCommit readFileForCommit(String fileForCommitLocation, Fi throw new NotFoundException("Can not read or parse committed file: %s", fileForCommitLocation); } } + + public List getWrittenFiles(List jobContexts) throws IOException { +List jobContextList = jobContexts.stream() +.map(TezUtil::enrichContextWithVertexId) +.collect(Collectors.toList()); +List outputs = collectOutputs(jobContextList); +ExecutorService fileExecutor = fileExecutor(jobContextList.get(0).getJobConf()); +ExecutorService tableExecutor = tableExecutor(jobContextList.get(0).getJobConf(), outputs.size()); +Collection dataFiles = new ConcurrentLinkedQueue<>(); +try { + Tasks.foreach(outputs.stream().flatMap(kv -> kv.jobContexts.stream() + .map(jobContext -> new SimpleImmutableEntry<>(kv.table, jobContext + .suppressFailureWhenFinished() + .executeWith(tableExecutor) + .onFailure((output, exc) -> LOG.warn("Failed to retrieve merge input file for the table {}", output, exc)) + .run(output -> { +JobContext jobContext = output.getValue(); +JobConf jobConf = jobContext.getJobConf(); +LOG.info("Cleaning job for jobID: {}, table: {}", jobContext.getJobID(), output); + +Table table = output.getKey(); +String jobLocation = generateJobLocation(table.location(), jobConf, jobContext.getJobID()); +// list jobLocation to get number of forCommit files +// we do this because map/reduce num in jobConf is unreliable +// and we have no access to vertex status info +int numTasks = listForCommits(jobConf, jobLocation).size(); +FilesForCommit results = collectResults(numTasks, fileExecutor, table.location(), jobContext, +table.io(), false); +dataFiles.addAll(results.dataFiles()); + }, IOException.class); +} finally { + fileExecutor.shutdown(); + if (tableExecutor != null) { Review Comment: `fileExecutor` is always created, whereas `tableExecutor` can be null. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505593646 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java: ## @@ -178,7 +178,7 @@ public RecordReader> getRecordReader(InputSplit split, J @Override public boolean shouldSkipCombine(Path path, Configuration conf) { -return true; +return false; Review Comment: This function is used during generation of splits in CombineRecordReader (during execution of merge tasks). It is a flag function which tells whether merging is supported by the file format. `hive.merge.tezfiles=false` will still work to disable merge functionality. ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java: ## @@ -163,6 +167,26 @@ public void commitTask(TaskAttemptContext originalContext) throws IOException { LOG.info("CommitTask found no serialized table in config for table: {}.", output); } }, IOException.class); + + // Merge task has merged several files into one. Hence we need to remove the stale files. Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505593646 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java: ## @@ -178,7 +178,7 @@ public RecordReader> getRecordReader(InputSplit split, J @Override public boolean shouldSkipCombine(Path path, Configuration conf) { -return true; +return false; Review Comment: This function is used during generation of splits in CombineRecordReader (during execution of merge tasks). It is a flag function which tells whether merging is supported. `hive.merge.tezfiles=false` will still work to disable merge functionality. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505593646 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java: ## @@ -178,7 +178,7 @@ public RecordReader> getRecordReader(InputSplit split, J @Override public boolean shouldSkipCombine(Path path, Configuration conf) { -return true; +return false; Review Comment: This function is used during generation of splits in CombineRecordReader (during generation of merge tasks). It is a flag function which tells whether merging is supported. `hive.merge.tezfiles=false` will still work to disable merge functionality. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505590111 ## ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java: ## @@ -512,7 +555,30 @@ private void setupWorkWhenUsingManifestFile(MapWork mapWork, List fi mapWork.setUseInputPathsDirectly(true); } - private Map> getManifestDirs(FileSystem inpFs, List fileStatuses) + private void setupWorkWhenUsingCustomHandler(MapWork mapWork, Path dirPath, + StorageHandlerMergeProperties mergeProperties) throws ClassNotFoundException { +Map> aliasToWork = mapWork.getAliasToWork(); +Map pathToPartitionInfo = mapWork.getPathToPartitionInfo(); +Operator op = aliasToWork.get(dirPath.toString()); +PartitionDesc partitionDesc = pathToPartitionInfo.get(dirPath); +Path tmpDir = mergeProperties.getTmpLocation(); Review Comment: `mergeProperties` is present when its used under storage handler which supports it. The boolean `customStorageHandler` if true should make sure that `mergeProperties`is not null. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505587606 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergStorageHandlerMergeProperties.java: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.util.Properties; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat; +import org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat; +import org.apache.hadoop.hive.ql.io.orc.OrcInputFormat; +import org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat; +import org.apache.hadoop.hive.ql.io.orc.OrcSerde; +import org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat; +import org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat; +import org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe; +import org.apache.hadoop.hive.ql.plan.StorageHandlerMergeProperties; +import org.apache.hadoop.hive.serde2.avro.AvroSerDe; +import org.apache.iceberg.FileFormat; +import org.apache.iceberg.TableProperties; +import org.apache.iceberg.mr.Catalogs; + +public class IcebergStorageHandlerMergeProperties implements StorageHandlerMergeProperties { + + private final Properties properties; + + IcebergStorageHandlerMergeProperties(Properties properties) { +this.properties = properties; + } + + public Path getTmpLocation() { +String location = properties.getProperty(Catalogs.LOCATION); +return new Path(location + "/data/"); + } + + public String getInputFileFormatClassName() { +FileFormat fileFormat = FileFormat.fromString(properties.getProperty(TableProperties.DEFAULT_FILE_FORMAT)); +if (fileFormat == FileFormat.ORC) { + return OrcInputFormat.class.getName(); +} else if (fileFormat == FileFormat.PARQUET) { + return MapredParquetInputFormat.class.getName(); +} else if (fileFormat == FileFormat.AVRO) { + return AvroContainerInputFormat.class.getName(); +} +return null; + } + + public String getOutputFileFormatClassName() { +FileFormat fileFormat = FileFormat.fromString(properties.getProperty(TableProperties.DEFAULT_FILE_FORMAT)); +if (fileFormat == FileFormat.ORC) { + return OrcOutputFormat.class.getName(); +} else if (fileFormat == FileFormat.PARQUET) { + return MapredParquetOutputFormat.class.getName(); +} else if (fileFormat == FileFormat.AVRO) { + return AvroContainerOutputFormat.class.getName(); +} +return null; + } + + public String getFileSerdeClassName() { +FileFormat fileFormat = FileFormat.fromString(properties.getProperty(TableProperties.DEFAULT_FILE_FORMAT)); +if (fileFormat == FileFormat.ORC) { + return OrcSerde.class.getName(); +} else if (fileFormat == FileFormat.PARQUET) { + return ParquetHiveSerDe.class.getName(); +} else if (fileFormat == FileFormat.AVRO) { + return AvroSerDe.class.getName(); +} +return null; + } Review Comment: Replaced with existing StorageFormatDescriptor. Done. ## ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java: ## @@ -527,4 +593,20 @@ private Map> getManifestDirs(FileSystem inpFs, List } return manifestDirsToPaths; } + + private void setMergePropertiesToPartDesc(PartitionDesc partitionDesc, +StorageHandlerMergeProperties mergeProperties) throws ClassNotFoundException{ +String inputFileFormatClassName = mergeProperties.getInputFileFormatClassName(); +String outputFileFormatClassName = mergeProperties.getOutputFileFormatClassName(); +String serdeClassName = mergeProperties.getFileSerdeClassName(); Review Comment: Added a null check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe,
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505587079 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java: ## @@ -163,6 +167,26 @@ public void commitTask(TaskAttemptContext originalContext) throws IOException { LOG.info("CommitTask found no serialized table in config for table: {}.", output); } }, IOException.class); + + // Merge task has merged several files into one. Hence we need to remove the stale files. + // At this stage the file is written and task-committed, but the old files are still present. + if (CombineHiveInputFormat.class.equals(jobConf.getInputFormat().getClass())) { +MapWork mrwork = Utilities.getMapWork(jobConf); +if (mrwork != null) { + List mergedPaths = mrwork.getInputPaths(); + if (CollectionUtils.isNotEmpty(mergedPaths)) { +Tasks.foreach(mergedPaths) Review Comment: Removed empty collection check & replaced with null check which is necessary. Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
Re: [PR] HIVE-28069: Iceberg: Implement Merge task functionality for Iceberg tables [hive]
SourabhBadhya commented on code in PR #5076: URL: https://github.com/apache/hive/pull/5076#discussion_r1505586021 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java: ## @@ -163,6 +167,26 @@ public void commitTask(TaskAttemptContext originalContext) throws IOException { LOG.info("CommitTask found no serialized table in config for table: {}.", output); } }, IOException.class); + + // Merge task has merged several files into one. Hence we need to remove the stale files. + // At this stage the file is written and task-committed, but the old files are still present. + if (CombineHiveInputFormat.class.equals(jobConf.getInputFormat().getClass())) { Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org