[jira] [Work logged] (HIVE-25791) Improve SFS exception messages
[ https://issues.apache.org/jira/browse/HIVE-25791?focusedWorklogId=696424=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-696424 ] ASF GitHub Bot logged work on HIVE-25791: - Author: ASF GitHub Bot Created on: 15/Dec/21 07:49 Start Date: 15/Dec/21 07:49 Worklog Time Spent: 10m Work Description: kgyrtkirk merged pull request #2859: URL: https://github.com/apache/hive/pull/2859 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 696424) Time Spent: 20m (was: 10m) > Improve SFS exception messages > -- > > Key: HIVE-25791 > URL: https://issues.apache.org/jira/browse/HIVE-25791 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Especially for cases when the path is already known to be invalid; like: > {code}sfs+file:///nonexistent/nonexistent.txt/#SINGLEFILE#{code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25791) Improve SFS exception messages
[ https://issues.apache.org/jira/browse/HIVE-25791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-25791. - Fix Version/s: 4.0.0 Resolution: Fixed merged into master. Thank you [~kkasa] for reviewing the changes! > Improve SFS exception messages > -- > > Key: HIVE-25791 > URL: https://issues.apache.org/jira/browse/HIVE-25791 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Especially for cases when the path is already known to be invalid; like: > {code}sfs+file:///nonexistent/nonexistent.txt/#SINGLEFILE#{code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-24893) Download data from Thriftserver through JDBC
[ https://issues.apache.org/jira/browse/HIVE-24893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24893: -- Labels: pull-request-available (was: ) > Download data from Thriftserver through JDBC > > > Key: HIVE-24893 > URL: https://issues.apache.org/jira/browse/HIVE-24893 > Project: Hive > Issue Type: New Feature > Components: HiveServer2, JDBC >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > It is very useful to support downloading large amounts of data (such as more > than 50GB) through JDBC. > Snowflake has similar support : > https://docs.snowflake.com/en/user-guide/jdbc-using.html#label-jdbc-download-from-stage-to-stream > https://github.com/snowflakedb/snowflake-jdbc/blob/95a7d8a03316093430dc3960df6635643208b6fd/src/main/java/net/snowflake/client/jdbc/SnowflakeConnectionV1.java#L886 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-24893) Download data from Thriftserver through JDBC
[ https://issues.apache.org/jira/browse/HIVE-24893?focusedWorklogId=696409=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-696409 ] ASF GitHub Bot logged work on HIVE-24893: - Author: ASF GitHub Bot Created on: 15/Dec/21 07:22 Start Date: 15/Dec/21 07:22 Worklog Time Spent: 10m Work Description: wangyum opened a new pull request #2878: URL: https://github.com/apache/hive/pull/2878 ### What changes were proposed in this pull request? Add `UploadData` and `DownloadData` to TCLIService.thrift. ### Why are the changes needed? It is very useful to support downloading large amounts of data (such as more than 50GB) through JDBC. Snowflake has similar support : https://docs.snowflake.com/en/user-guide/jdbc-using.html#label-jdbc-download-from-stage-to-stream https://github.com/snowflakedb/snowflake-jdbc/blob/95a7d8a03316093430dc3960df6635643208b6fd/src/main/java/net/snowflake/client/jdbc/SnowflakeConnectionV1.java#L886 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? // TODO -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 696409) Remaining Estimate: 0h Time Spent: 10m > Download data from Thriftserver through JDBC > > > Key: HIVE-24893 > URL: https://issues.apache.org/jira/browse/HIVE-24893 > Project: Hive > Issue Type: New Feature > Components: HiveServer2, JDBC >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > It is very useful to support downloading large amounts of data (such as more > than 50GB) through JDBC. > Snowflake has similar support : > https://docs.snowflake.com/en/user-guide/jdbc-using.html#label-jdbc-download-from-stage-to-stream > https://github.com/snowflakedb/snowflake-jdbc/blob/95a7d8a03316093430dc3960df6635643208b6fd/src/main/java/net/snowflake/client/jdbc/SnowflakeConnectionV1.java#L886 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25805) Wrong result when rebuilding MV with count(col) incrementally
[ https://issues.apache.org/jira/browse/HIVE-25805?focusedWorklogId=696408=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-696408 ] ASF GitHub Bot logged work on HIVE-25805: - Author: ASF GitHub Bot Created on: 15/Dec/21 07:22 Start Date: 15/Dec/21 07:22 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #2872: URL: https://github.com/apache/hive/pull/2872#discussion_r769317236 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateInsertDeleteIncrementalRewritingRule.java ## @@ -139,7 +139,14 @@ protected IncrementalComputePlanWithDeletedRows createJoinRightInput(RelOptRuleC switch (aggregateCall.getAggregation().getKind()) { case COUNT: aggFunction = SqlStdOperatorTable.SUM; - argument = relBuilder.literal(1); + + // count(*) + if (aggregateCall.getArgList().isEmpty()) { +argument = relBuilder.literal(1); + } else { +// count() +argument = genArgumentForCountColumn(relBuilder, rexBuilder, aggInput, aggregateCall); Review comment: notes: I think you could access `rexBuilder` from `relBuilder` I think you could also push the encolsing `if` into this function; or inline the whole function - but right now you have some "count argument" related logic here and there too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 696408) Time Spent: 20m (was: 10m) > Wrong result when rebuilding MV with count(col) incrementally > - > > Key: HIVE-25805 > URL: https://issues.apache.org/jira/browse/HIVE-25805 > Project: Hive > Issue Type: Bug > Components: CBO, Materialized views >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > {code:java} > create table t1(a char(15), b int) stored as orc TBLPROPERTIES > ('transactional'='true'); > insert into t1(a, b) values ('old', 1); > create materialized view mat1 stored as orc TBLPROPERTIES > ('transactional'='true') as > select t1.a, count(t1.b), count(*) from t1 group by t1.a; > delete from t1 where b = 1; > insert into t1(a,b) values > ('new', null); > alter materialized view mat1 rebuild; > select * from mat1; > {code} > returns > {code:java} > new 1 1 > {code} > but, should be > {code:java} > new 0 1 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25792) Multi Insert query fails on CBO path
[ https://issues.apache.org/jira/browse/HIVE-25792?focusedWorklogId=696405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-696405 ] ASF GitHub Bot logged work on HIVE-25792: - Author: ASF GitHub Bot Created on: 15/Dec/21 07:16 Start Date: 15/Dec/21 07:16 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2865: URL: https://github.com/apache/hive/pull/2865#discussion_r769314898 ## File path: ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/TestCBOReCompilation.java ## @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.optimizer.calcite; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.conf.HiveConf.ConfVars; +import org.apache.hadoop.hive.ql.DriverFactory; +import org.apache.hadoop.hive.ql.IDriver; +import org.apache.hadoop.hive.ql.processors.CommandProcessorException; +import org.apache.hadoop.hive.ql.session.SessionState; +import org.apache.hive.testutils.HiveTestEnvSetup; +import org.junit.AfterClass; +import org.junit.Assert; +import org.junit.BeforeClass; +import org.junit.ClassRule; +import org.junit.Test; + +public class TestCBOReCompilation { + + @ClassRule + public static HiveTestEnvSetup env_setup = new HiveTestEnvSetup(); + + @BeforeClass + public static void beforeClass() throws Exception { +try (IDriver driver = createDriver()) { + dropTables(driver); + String[] cmds = { + // @formatter:off + "create table aa1 ( stf_id string)", + "create table bb1 ( stf_id string)", + "create table cc1 ( stf_id string)", + "create table ff1 ( x string)" + // @formatter:on + }; + for (String cmd : cmds) { +driver.run(cmd); + } +} + } + + @AfterClass + public static void afterClass() throws Exception { +try (IDriver driver = createDriver()) { + dropTables(driver); +} + } + + public static void dropTables(IDriver driver) throws Exception { +String[] tables = new String[] {"aa1", "bb1", "cc1", "ff1" }; +for (String t : tables) { + driver.run("drop table if exists " + t); +} + } + + @Test + public void testReExecutedOnError() throws Exception { +try (IDriver driver = createDriver("ALWAYS")) { + String query = "explain from ff1 as a join cc1 as b " + + "insert overwrite table aa1 select stf_id GROUP BY b.stf_id " + + "insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id"; + driver.run(query); +} + } + + @Test + public void testFailOnError() throws Exception { +try (IDriver driver = createDriver("TEST")) { + String query = "explain from ff1 as a join cc1 as b " + + "insert overwrite table aa1 select stf_id GROUP BY b.stf_id " + + "insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id"; + Assert.assertThrows("Plan not optimized by CBO", CommandProcessorException.class, () -> driver.run(query)); Review comment: Found other tests created by @zabetak, so no need to keep these -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 696405) Time Spent: 3.5h (was: 3h 20m) > Multi Insert query fails on CBO path > - > > Key: HIVE-25792 > URL: https://issues.apache.org/jira/browse/HIVE-25792 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > > {code} > set hive.cbo.enable=true; > drop table if exists aa1; > drop table if exists bb1; > drop table if exists cc1; >
[jira] [Work logged] (HIVE-25792) Multi Insert query fails on CBO path
[ https://issues.apache.org/jira/browse/HIVE-25792?focusedWorklogId=696403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-696403 ] ASF GitHub Bot logged work on HIVE-25792: - Author: ASF GitHub Bot Created on: 15/Dec/21 07:15 Start Date: 15/Dec/21 07:15 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2865: URL: https://github.com/apache/hive/pull/2865#discussion_r769314437 ## File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/IReExecutionPlugin.java ## @@ -42,24 +42,72 @@ /** * Called before executing the query. */ - void beforeExecute(int executionIndex, boolean explainReOptimization); + default void beforeExecute(int executionIndex, boolean explainReOptimization) { +// default noop + } /** * The query have failed, does this plugin advises to re-execute it again? */ - boolean shouldReExecute(int executionNum); + default boolean shouldReExecute(int executionNum) { +// default no +return false; + } /** - * The plugin should prepare for the re-compilaton of the query. + * The plugin should prepare for the re-compilation of the query. */ - void prepareToReExecute(); + default void prepareToReExecute() { +// default noop + } /** - * The query have failed; and have been recompiled - does this plugin advises to re-execute it again? + * The query has failed; and have been recompiled - does this plugin advises to re-execute it again? */ - boolean shouldReExecute(int executionNum, PlanMapper oldPlanMapper, PlanMapper newPlanMapper); + default boolean shouldReExecute(int executionNum, PlanMapper oldPlanMapper, PlanMapper newPlanMapper) { Review comment: Keeping as discussed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 696403) Time Spent: 3h 10m (was: 3h) > Multi Insert query fails on CBO path > - > > Key: HIVE-25792 > URL: https://issues.apache.org/jira/browse/HIVE-25792 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > > {code} > set hive.cbo.enable=true; > drop table if exists aa1; > drop table if exists bb1; > drop table if exists cc1; > drop table if exists dd1; > drop table if exists ee1; > drop table if exists ff1; > create table aa1 ( stf_id string); > create table bb1 ( stf_id string); > create table cc1 ( stf_id string); > create table ff1 ( x string); > explain > from ff1 as a join cc1 as b > insert overwrite table aa1 select stf_id GROUP BY b.stf_id > insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id > ; > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25792) Multi Insert query fails on CBO path
[ https://issues.apache.org/jira/browse/HIVE-25792?focusedWorklogId=696404=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-696404 ] ASF GitHub Bot logged work on HIVE-25792: - Author: ASF GitHub Bot Created on: 15/Dec/21 07:15 Start Date: 15/Dec/21 07:15 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2865: URL: https://github.com/apache/hive/pull/2865#discussion_r769314666 ## File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionCBOPlugin.java ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.reexec; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.Driver; +import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook; +import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext; +import org.apache.hadoop.hive.ql.parse.CBOException; +import org.apache.hadoop.hive.ql.parse.SemanticException; + +/** + * Re-compiles the query without CBO + */ +public class ReExecutionCBOPlugin implements IReExecutionPlugin { + + private Driver driver; + private boolean retryPossible = false; + private CBOFallbackStrategy fallbackStrategy; + + class LocalHook implements QueryLifeTimeHook { +@Override +public void beforeCompile(QueryLifeTimeHookContext ctx) { + // noop +} + +@Override +public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) { + if (hasError) { +Throwable throwable = ctx.getHookContext().getException(); +if (throwable != null) { + if (throwable instanceof CBOException) { +// Determine if we should re-throw the exception OR if we retry planning with non-CBO. +if (fallbackStrategy.isFatal((CBOException) throwable)) { + Throwable cause = throwable.getCause(); + if (cause instanceof RuntimeException || cause instanceof SemanticException) { +// These types of exceptions do not need wrapped +retryPossible = false; +return; + } + // Wrap all other errors (Should only hit in tests) + throw new RuntimeException(cause); +} else { + // Only if the exception is a CBOException then we can retry + retryPossible = true; Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 696404) Time Spent: 3h 20m (was: 3h 10m) > Multi Insert query fails on CBO path > - > > Key: HIVE-25792 > URL: https://issues.apache.org/jira/browse/HIVE-25792 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > > {code} > set hive.cbo.enable=true; > drop table if exists aa1; > drop table if exists bb1; > drop table if exists cc1; > drop table if exists dd1; > drop table if exists ee1; > drop table if exists ff1; > create table aa1 ( stf_id string); > create table bb1 ( stf_id string); > create table cc1 ( stf_id string); > create table ff1 ( x string); > explain > from ff1 as a join cc1 as b > insert overwrite table aa1 select stf_id GROUP BY b.stf_id > insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id > ; > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25792) Multi Insert query fails on CBO path
[ https://issues.apache.org/jira/browse/HIVE-25792?focusedWorklogId=696402=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-696402 ] ASF GitHub Bot logged work on HIVE-25792: - Author: ASF GitHub Bot Created on: 15/Dec/21 07:14 Start Date: 15/Dec/21 07:14 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2865: URL: https://github.com/apache/hive/pull/2865#discussion_r769314240 ## File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java ## @@ -167,20 +201,25 @@ public CommandProcessorResponse run() throws CommandProcessorException { } PlanMapper oldPlanMapper = coreDriver.getPlanMapper(); - afterExecute(oldPlanMapper, cpr != null); + final boolean success = cpr != null; + plugins.forEach(p -> p.afterExecute(oldPlanMapper, success)); + + // If the execution was successful return the result + if (success) { Review comment: Reverted this change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 696402) Time Spent: 3h (was: 2h 50m) > Multi Insert query fails on CBO path > - > > Key: HIVE-25792 > URL: https://issues.apache.org/jira/browse/HIVE-25792 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > {code} > set hive.cbo.enable=true; > drop table if exists aa1; > drop table if exists bb1; > drop table if exists cc1; > drop table if exists dd1; > drop table if exists ee1; > drop table if exists ff1; > create table aa1 ( stf_id string); > create table bb1 ( stf_id string); > create table cc1 ( stf_id string); > create table ff1 ( x string); > explain > from ff1 as a join cc1 as b > insert overwrite table aa1 select stf_id GROUP BY b.stf_id > insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id > ; > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25792) Multi Insert query fails on CBO path
[ https://issues.apache.org/jira/browse/HIVE-25792?focusedWorklogId=696401=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-696401 ] ASF GitHub Bot logged work on HIVE-25792: - Author: ASF GitHub Bot Created on: 15/Dec/21 07:14 Start Date: 15/Dec/21 07:14 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2865: URL: https://github.com/apache/hive/pull/2865#discussion_r769313969 ## File path: ql/src/java/org/apache/hadoop/hive/ql/HookRunner.java ## @@ -121,19 +121,27 @@ void runBeforeCompileHook(String command) { } /** - * Dispatches {@link QueryLifeTimeHook#afterCompile(QueryLifeTimeHookContext, boolean)}. - * - * @param command the Hive command that is being run - * @param compileError true if there was an error while compiling the command, false otherwise - */ - void runAfterCompilationHook(String command, boolean compileError) { + * Dispatches {@link QueryLifeTimeHook#afterCompile(QueryLifeTimeHookContext, boolean)}. + * + * @param driverContext the DriverContext used for generating the HookContext + * @param analyzerContext the SemanticAnalyzer context for this query + * @param compileException the exception if one was thrown during the compilation + */ + void runAfterCompilationHook(DriverContext driverContext, Context analyzerContext, Throwable compileException) { Review comment: Keeping as it is based on our discussion -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 696401) Time Spent: 2h 50m (was: 2h 40m) > Multi Insert query fails on CBO path > - > > Key: HIVE-25792 > URL: https://issues.apache.org/jira/browse/HIVE-25792 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > {code} > set hive.cbo.enable=true; > drop table if exists aa1; > drop table if exists bb1; > drop table if exists cc1; > drop table if exists dd1; > drop table if exists ee1; > drop table if exists ff1; > create table aa1 ( stf_id string); > create table bb1 ( stf_id string); > create table cc1 ( stf_id string); > create table ff1 ( x string); > explain > from ff1 as a join cc1 as b > insert overwrite table aa1 select stf_id GROUP BY b.stf_id > insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id > ; > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25744) Support backward compatibility of thrift struct CreationMetadata
[ https://issues.apache.org/jira/browse/HIVE-25744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa resolved HIVE-25744. --- Resolution: Fixed Pushed to master. Thanks [~kgyrtkirk] for review. > Support backward compatibility of thrift struct CreationMetadata > > > Key: HIVE-25744 > URL: https://issues.apache.org/jira/browse/HIVE-25744 > Project: Hive > Issue Type: Task > Components: Materialized views, Thrift API >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Old > {code} > struct CreationMetadata { > 1: required string catName > 2: required string dbName, > 3: required string tblName, > 4: required set tablesUsed, > 5: optional string validTxnList, > 6: optional i64 materializationTime > }HIVE-25656 introduced a breaking change in the HiveServer2 <-> Metastore > thrift api: > {code} > New > {code} > struct CreationMetadata { > 1: required string catName > 2: required string dbName, > 3: required string tblName, > 4: required set tablesUsed, > 5: optional string validTxnList, > 6: optional i64 materializationTime > } > {code} > 4th field type changed -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25744) Support backward compatibility of thrift struct CreationMetadata
[ https://issues.apache.org/jira/browse/HIVE-25744?focusedWorklogId=696360=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-696360 ] ASF GitHub Bot logged work on HIVE-25744: - Author: ASF GitHub Bot Created on: 15/Dec/21 05:27 Start Date: 15/Dec/21 05:27 Worklog Time Spent: 10m Work Description: kasakrisz merged pull request #2821: URL: https://github.com/apache/hive/pull/2821 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 696360) Time Spent: 20m (was: 10m) > Support backward compatibility of thrift struct CreationMetadata > > > Key: HIVE-25744 > URL: https://issues.apache.org/jira/browse/HIVE-25744 > Project: Hive > Issue Type: Task > Components: Materialized views, Thrift API >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Old > {code} > struct CreationMetadata { > 1: required string catName > 2: required string dbName, > 3: required string tblName, > 4: required set tablesUsed, > 5: optional string validTxnList, > 6: optional i64 materializationTime > }HIVE-25656 introduced a breaking change in the HiveServer2 <-> Metastore > thrift api: > {code} > New > {code} > struct CreationMetadata { > 1: required string catName > 2: required string dbName, > 3: required string tblName, > 4: required set tablesUsed, > 5: optional string validTxnList, > 6: optional i64 materializationTime > } > {code} > 4th field type changed -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25783) Enforce ASF headers on Metastore
[ https://issues.apache.org/jira/browse/HIVE-25783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng updated HIVE-25783: --- Summary: Enforce ASF headers on Metastore (was: Provide rat check to the CI) > Enforce ASF headers on Metastore > > > Key: HIVE-25783 > URL: https://issues.apache.org/jira/browse/HIVE-25783 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > > The Jira tries to investigate if we can provide rat check to the CI, make > sure that the newly added source files contain the ASF license information. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25615) Hive on tez will generate at least one MapContainer per 0 length file
[ https://issues.apache.org/jira/browse/HIVE-25615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25615: -- Labels: pull-request-available (was: ) > Hive on tez will generate at least one MapContainer per 0 length file > - > > Key: HIVE-25615 > URL: https://issues.apache.org/jira/browse/HIVE-25615 > Project: Hive > Issue Type: Bug > Components: Query Planning, Query Processor, Tez >Affects Versions: 3.1.2 > Environment: hive-3.1.2 > tez-0.10.1 >Reporter: JackYan >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When tez read a table with many partitions and those partitions contain 0 > length file only, ColumnarSplitSizeEstimator will return Integer.MAX_VALUE > bytes length for every 0 length file.Then,TezSplitGrouper will treat those > files as big files,and generate at least one MapContainer per 0 file to > handle it.This is incorrect and even wasteful. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25615) Hive on tez will generate at least one MapContainer per 0 length file
[ https://issues.apache.org/jira/browse/HIVE-25615?focusedWorklogId=696251=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-696251 ] ASF GitHub Bot logged work on HIVE-25615: - Author: ASF GitHub Bot Created on: 15/Dec/21 00:12 Start Date: 15/Dec/21 00:12 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #2723: URL: https://github.com/apache/hive/pull/2723#issuecomment-994158113 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 696251) Remaining Estimate: 0h Time Spent: 10m > Hive on tez will generate at least one MapContainer per 0 length file > - > > Key: HIVE-25615 > URL: https://issues.apache.org/jira/browse/HIVE-25615 > Project: Hive > Issue Type: Bug > Components: Query Planning, Query Processor, Tez >Affects Versions: 3.1.2 > Environment: hive-3.1.2 > tez-0.10.1 >Reporter: JackYan >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When tez read a table with many partitions and those partitions contain 0 > length file only, ColumnarSplitSizeEstimator will return Integer.MAX_VALUE > bytes length for every 0 length file.Then,TezSplitGrouper will treat those > files as big files,and generate at least one MapContainer per 0 file to > handle it.This is incorrect and even wasteful. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25800) loadDynamicPartitions in Hive.java should not load all partitions of a managed table
[ https://issues.apache.org/jira/browse/HIVE-25800?focusedWorklogId=696055=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-696055 ] ASF GitHub Bot logged work on HIVE-25800: - Author: ASF GitHub Bot Created on: 14/Dec/21 18:57 Start Date: 14/Dec/21 18:57 Worklog Time Spent: 10m Work Description: sourabh912 commented on pull request #2868: URL: https://github.com/apache/hive/pull/2868#issuecomment-993883393 @kgyrtkirk : Please review and provide your feedback. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 696055) Time Spent: 0.5h (was: 20m) > loadDynamicPartitions in Hive.java should not load all partitions of a > managed table > - > > Key: HIVE-25800 > URL: https://issues.apache.org/jira/browse/HIVE-25800 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Sourabh Goyal >Assignee: Sourabh Goyal >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > HIVE-20661 added an improvement in loadDynamicPartitions() api in Hive.java > to not add partitions one by one in HMS. As part of that improvement, > following code was introduced: > {code:java} > // fetch all the partitions matching the part spec using the partition > iterable > // this way the maximum batch size configuration parameter is considered > PartitionIterable partitionIterable = new PartitionIterable(Hive.get(), tbl, > partSpec, > conf.getInt(MetastoreConf.ConfVars.BATCH_RETRIEVE_MAX.getVarname(), > 300)); > Iterator iterator = partitionIterable.iterator(); > // Match valid partition path to partitions > while (iterator.hasNext()) { > Partition partition = iterator.next(); > partitionDetailsMap.entrySet().stream() > .filter(entry -> > entry.getValue().fullSpec.equals(partition.getSpec())) > .findAny().ifPresent(entry -> { > entry.getValue().partition = partition; > entry.getValue().hasOldPartition = true; > }); > } {code} > The above code fetches all the existing partitions for a table from HMS and > compare that dynamic partitions list to decide old and new partitions to be > added to HMS (in batches). The call to fetch all partitions has introduced a > performance regression for tables with large number of partitions (of the > order of 100K). > > This is fixed for external tables in > https://issues.apache.org/jira/browse/HIVE-25178. However for ACID tables > there is an open Jira(HIVE-25187). Until we have an appropriate fix in > HIVE-25187, we can apply the following: > Skip fetching all partitions. Instead, in the threadPool which loads each > partition individually, call get_partition() to check if the partition > already exists in HMS or not. > This will introduce additional getPartition() call for every partition to be > loaded dynamically but removes fetching all existing partitions for a table. > I believe this is fine since for tables with small number of existing > partitions in HMS - getPartitions() won't add too much overhead but for > tables with large number of existing partitions, it will certainly avoid > getting all partitions from HMS > cc - [~lpinter] [~ngangam] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25809) Implement URI Mapping for KuduStorageHandler in Hive
[ https://issues.apache.org/jira/browse/HIVE-25809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25809: -- Labels: pull-request-available (was: ) > Implement URI Mapping for KuduStorageHandler in Hive > - > > Key: HIVE-25809 > URL: https://issues.apache.org/jira/browse/HIVE-25809 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Security >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, there is no storage URI mapping for KuduStorageHandler based on > the feature HIVE-24705. The API getURIForAuth() needs to be implemented in > KuduStorageHandler. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25809) Implement URI Mapping for KuduStorageHandler in Hive
[ https://issues.apache.org/jira/browse/HIVE-25809?focusedWorklogId=696036=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-696036 ] ASF GitHub Bot logged work on HIVE-25809: - Author: ASF GitHub Bot Created on: 14/Dec/21 18:36 Start Date: 14/Dec/21 18:36 Worklog Time Spent: 10m Work Description: saihemanth-cloudera opened a new pull request #2877: URL: https://github.com/apache/hive/pull/2877 ### What changes were proposed in this pull request? Implemented getURIForAuth() API in the kudu storage handler ### Why are the changes needed? To prevent a user breaching hive-24705 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Local machine, remote cluster -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 696036) Remaining Estimate: 0h Time Spent: 10m > Implement URI Mapping for KuduStorageHandler in Hive > - > > Key: HIVE-25809 > URL: https://issues.apache.org/jira/browse/HIVE-25809 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Security >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, there is no storage URI mapping for KuduStorageHandler based on > the feature HIVE-24705. The API getURIForAuth() needs to be implemented in > KuduStorageHandler. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25795) [CVE-2021-44228] Update log4j2 version to 2.15.0
[ https://issues.apache.org/jira/browse/HIVE-25795?focusedWorklogId=696020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-696020 ] ASF GitHub Bot logged work on HIVE-25795: - Author: ASF GitHub Bot Created on: 14/Dec/21 18:19 Start Date: 14/Dec/21 18:19 Worklog Time Spent: 10m Work Description: kevinverhoeven commented on pull request #2863: URL: https://github.com/apache/hive/pull/2863#issuecomment-993853334 @guptanikhil007 thank you for this change, are you planning a fix for 2.x and 3.x? This fix is applied to 4.x which has not been released. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 696020) Time Spent: 4h 20m (was: 4h 10m) > [CVE-2021-44228] Update log4j2 version to 2.15.0 > > > Key: HIVE-25795 > URL: https://issues.apache.org/jira/browse/HIVE-25795 > Project: Hive > Issue Type: Bug > Components: Logging >Affects Versions: 3.1.2, 4.0.0 >Reporter: Nikhil Gupta >Assignee: Nikhil Gupta >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > [Worst Apache Log4j RCE Zero day Dropped on Internet - Cyber > Kendra|https://www.cyberkendra.com/2021/12/worst-log4j-rce-zeroday-dropped-on.html] > Vulnerability: > https://github.com/apache/logging-log4j2/commit/7fe72d6 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25809) Implement URI Mapping for KuduStorageHandler in Hive
[ https://issues.apache.org/jira/browse/HIVE-25809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sai Hemanth Gantasala reassigned HIVE-25809: > Implement URI Mapping for KuduStorageHandler in Hive > - > > Key: HIVE-25809 > URL: https://issues.apache.org/jira/browse/HIVE-25809 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Security >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > > Currently, there is no storage URI mapping for KuduStorageHandler based on > the feature HIVE-24705. The API getURIForAuth() needs to be implemented in > KuduStorageHandler. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25795) [CVE-2021-44228] Update log4j2 version to 2.15.0
[ https://issues.apache.org/jira/browse/HIVE-25795?focusedWorklogId=696007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-696007 ] ASF GitHub Bot logged work on HIVE-25795: - Author: ASF GitHub Bot Created on: 14/Dec/21 18:06 Start Date: 14/Dec/21 18:06 Worklog Time Spent: 10m Work Description: nrg4878 commented on pull request #2876: URL: https://github.com/apache/hive/pull/2876#issuecomment-993842290 @yongzhi Could you please review? Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 696007) Time Spent: 4h 10m (was: 4h) > [CVE-2021-44228] Update log4j2 version to 2.15.0 > > > Key: HIVE-25795 > URL: https://issues.apache.org/jira/browse/HIVE-25795 > Project: Hive > Issue Type: Bug > Components: Logging >Affects Versions: 3.1.2, 4.0.0 >Reporter: Nikhil Gupta >Assignee: Nikhil Gupta >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > [Worst Apache Log4j RCE Zero day Dropped on Internet - Cyber > Kendra|https://www.cyberkendra.com/2021/12/worst-log4j-rce-zeroday-dropped-on.html] > Vulnerability: > https://github.com/apache/logging-log4j2/commit/7fe72d6 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25808) Analyse table does not fail for non existing partitions
[ https://issues.apache.org/jira/browse/HIVE-25808?focusedWorklogId=695905=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695905 ] ASF GitHub Bot logged work on HIVE-25808: - Author: ASF GitHub Bot Created on: 14/Dec/21 15:48 Start Date: 14/Dec/21 15:48 Worklog Time Spent: 10m Work Description: maheshk114 opened a new pull request #2875: URL: https://github.com/apache/hive/pull/2875 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 695905) Remaining Estimate: 0h Time Spent: 10m > Analyse table does not fail for non existing partitions > --- > > Key: HIVE-25808 > URL: https://issues.apache.org/jira/browse/HIVE-25808 > Project: Hive > Issue Type: Bug >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > If all the column names are given in the analyse command , then the query > fails. But if all the partition column values are not given then its not > failing. > analyze table tbl partition *(fld1 = 2, fld2 = 3)* COMPUTE STATISTICS FOR > COLUMNS – This will fail with SemanticException, if partition corresponds to > fld1 = 2, fld2 = 3 does not exists. But analyze table tbl partition *(fld1 = > 2)* COMPUTE STATISTICS FOR COLUMNS, this will not fail and it will compute > stats for whole table. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25808) Analyse table does not fail for non existing partitions
[ https://issues.apache.org/jira/browse/HIVE-25808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25808: -- Labels: pull-request-available (was: ) > Analyse table does not fail for non existing partitions > --- > > Key: HIVE-25808 > URL: https://issues.apache.org/jira/browse/HIVE-25808 > Project: Hive > Issue Type: Bug >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > If all the column names are given in the analyse command , then the query > fails. But if all the partition column values are not given then its not > failing. > analyze table tbl partition *(fld1 = 2, fld2 = 3)* COMPUTE STATISTICS FOR > COLUMNS – This will fail with SemanticException, if partition corresponds to > fld1 = 2, fld2 = 3 does not exists. But analyze table tbl partition *(fld1 = > 2)* COMPUTE STATISTICS FOR COLUMNS, this will not fail and it will compute > stats for whole table. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25808) Analyse table does not fail for non existing partitions
[ https://issues.apache.org/jira/browse/HIVE-25808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera reassigned HIVE-25808: -- Assignee: mahesh kumar behera > Analyse table does not fail for non existing partitions > --- > > Key: HIVE-25808 > URL: https://issues.apache.org/jira/browse/HIVE-25808 > Project: Hive > Issue Type: Bug >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > > If all the column names are given in the analyse command , then the query > fails. But if all the partition column values are not given then its not > failing. > analyze table tbl partition *(fld1 = 2, fld2 = 3)* COMPUTE STATISTICS FOR > COLUMNS – This will fail with SemanticException, if partition corresponds to > fld1 = 2, fld2 = 3 does not exists. But analyze table tbl partition *(fld1 = > 2)* COMPUTE STATISTICS FOR COLUMNS, this will not fail and it will compute > stats for whole table. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-25540) Enable batch update of column stats only for MySql and Postgres
[ https://issues.apache.org/jira/browse/HIVE-25540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459241#comment-17459241 ] mahesh kumar behera commented on HIVE-25540: [~zabetak] The batch update uses direct SQL to optimize the number of backend database calls. Some of the SQL used are not supported by Oracle. So we need to put a check to go via DN if the backend DB is Oracle. Currently we have tested only in Mysql and Postgres. Batch update feature is not yet shipped. > Enable batch update of column stats only for MySql and Postgres > > > Key: HIVE-25540 > URL: https://issues.apache.org/jira/browse/HIVE-25540 > Project: Hive > Issue Type: Sub-task >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > The batch updation of partition column stats using direct sql is tested only > for MySql and Postgres. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25778) Hive DB creation is failing when MANAGEDLOCATION is specified with existing location
[ https://issues.apache.org/jira/browse/HIVE-25778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera resolved HIVE-25778. Resolution: Won't Fix Closing this Jira as supporting this scenario may lead to data loss/corruption if user is not very careful. > Hive DB creation is failing when MANAGEDLOCATION is specified with existing > location > > > Key: HIVE-25778 > URL: https://issues.apache.org/jira/browse/HIVE-25778 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > As part of HIVE-23387 check is added to restrict user from creating database > with managed table location, if the location is already present. This was not > the case. As this is causing backward compatibility issue, the check needs to > be removed. > > {code:java} > if (madeManagedDir) { > LOG.info("Created database path in managed directory " + dbMgdPath); > } else { > throw new MetaException( > "Unable to create database managed directory " + dbMgdPath + ", failed > to create database " + db.getName()); > } {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25778) Hive DB creation is failing when MANAGEDLOCATION is specified with existing location
[ https://issues.apache.org/jira/browse/HIVE-25778?focusedWorklogId=695884=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695884 ] ASF GitHub Bot logged work on HIVE-25778: - Author: ASF GitHub Bot Created on: 14/Dec/21 15:21 Start Date: 14/Dec/21 15:21 Worklog Time Spent: 10m Work Description: maheshk114 closed pull request #2846: URL: https://github.com/apache/hive/pull/2846 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 695884) Time Spent: 0.5h (was: 20m) > Hive DB creation is failing when MANAGEDLOCATION is specified with existing > location > > > Key: HIVE-25778 > URL: https://issues.apache.org/jira/browse/HIVE-25778 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > As part of HIVE-23387 check is added to restrict user from creating database > with managed table location, if the location is already present. This was not > the case. As this is causing backward compatibility issue, the check needs to > be removed. > > {code:java} > if (madeManagedDir) { > LOG.info("Created database path in managed directory " + dbMgdPath); > } else { > throw new MetaException( > "Unable to create database managed directory " + dbMgdPath + ", failed > to create database " + db.getName()); > } {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25778) Hive DB creation is failing when MANAGEDLOCATION is specified with existing location
[ https://issues.apache.org/jira/browse/HIVE-25778?focusedWorklogId=695883=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695883 ] ASF GitHub Bot logged work on HIVE-25778: - Author: ASF GitHub Bot Created on: 14/Dec/21 15:20 Start Date: 14/Dec/21 15:20 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #2846: URL: https://github.com/apache/hive/pull/2846#issuecomment-993650984 @pgaref Thanks for the review. I am not committing this as supporting this may lead to data loss/ corruption if user is not very careful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 695883) Time Spent: 20m (was: 10m) > Hive DB creation is failing when MANAGEDLOCATION is specified with existing > location > > > Key: HIVE-25778 > URL: https://issues.apache.org/jira/browse/HIVE-25778 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > As part of HIVE-23387 check is added to restrict user from creating database > with managed table location, if the location is already present. This was not > the case. As this is causing backward compatibility issue, the check needs to > be removed. > > {code:java} > if (madeManagedDir) { > LOG.info("Created database path in managed directory " + dbMgdPath); > } else { > throw new MetaException( > "Unable to create database managed directory " + dbMgdPath + ", failed > to create database " + db.getName()); > } {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-14261) Support set/unset partition parameters
[ https://issues.apache.org/jira/browse/HIVE-14261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459178#comment-17459178 ] Stamatis Zampetakis commented on HIVE-14261: [~xiepengjie] The motivation for the change, according to the example posted above, comes from other projects using HMS directly (not via HS2). Nevertheless, the syntax change seems to only affect HS2 thus I don't understand who is going to benefit from this in the end. Can you give some examples of which users/projects are going to use the new syntax and how? > Support set/unset partition parameters > -- > > Key: HIVE-14261 > URL: https://issues.apache.org/jira/browse/HIVE-14261 > Project: Hive > Issue Type: New Feature >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong >Priority: Major > Attachments: HIVE-14261.01.patch > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-25807) ok
[ https://issues.apache.org/jira/browse/HIVE-25807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459179#comment-17459179 ] László Bodor commented on HIVE-25807: - [~Din1]: may I ask what is this jira for? > ok > -- > > Key: HIVE-25807 > URL: https://issues.apache.org/jira/browse/HIVE-25807 > Project: Hive > Issue Type: Bug >Reporter: Pravin Pawar >Assignee: Pravin Pawar >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-21172) DEFAULT keyword handling in MERGE UPDATE clause issues
[ https://issues.apache.org/jira/browse/HIVE-21172?focusedWorklogId=695795=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695795 ] ASF GitHub Bot logged work on HIVE-21172: - Author: ASF GitHub Bot Created on: 14/Dec/21 13:59 Start Date: 14/Dec/21 13:59 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2857: URL: https://github.com/apache/hive/pull/2857#discussion_r768692003 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java ## @@ -1711,13 +1713,13 @@ public void testMajorCompactionAfterTwoMergeStatements() throws Exception { // Verify contents of bucket files. List expectedRsBucket0 = Arrays.asList("{\"writeid\":1,\"bucketid\":536870912,\"rowid\":3}\t4\tvalue_4", -"{\"writeid\":2,\"bucketid\":536870912,\"rowid\":0}\t6\tvalue_6", -"{\"writeid\":2,\"bucketid\":536870913,\"rowid\":2}\t3\tnewvalue_3", -"{\"writeid\":3,\"bucketid\":536870912,\"rowid\":0}\t8\tvalue_8", -"{\"writeid\":3,\"bucketid\":536870913,\"rowid\":0}\t5\tnewestvalue_5", -"{\"writeid\":3,\"bucketid\":536870913,\"rowid\":1}\t7\tnewestvalue_7", -"{\"writeid\":3,\"bucketid\":536870913,\"rowid\":2}\t1\tnewestvalue_1", - "{\"writeid\":3,\"bucketid\":536870913,\"rowid\":3}\t2\tnewestvalue_2"); +"{\"writeid\":2,\"bucketid\":536870913,\"rowid\":2}\t3\tnewvalue_3", Review comment: iiuc this test was created to verify the order of records in the bucket files after major compaction. Based on description of https://issues.apache.org/jira/browse/HIVE-25257 It should be ordered by originalTransactionId, bucketProperty, rowId. Unfortunately originalTransactionId can not be queried so I debugged the test and stop the execution before this assert. Then I dumped the orc file created on my local fs: ``` java -jar orc-tools-1.6.5/orc-tools-1.6.5-uber.jar data ./itests/hive-unit/target/tmp/org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez-1639489721524_1338883398/warehouse/comp_and_merge_test/base_003_v014/bucket_0 Processing data file itests/hive-unit/target/tmp/org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez-1639489721524_1338883398/warehouse/comp_and_merge_test/base_003_v014/bucket_0 [length: 808] {"operation":0,"originaltransaction":1,"bucket":536870912,"rowid":3,"currenttransaction":1,"row":{"id":4,"value":"value_4"}} {"operation":0,"originaltransaction":2,"bucket":536870913,"rowid":2,"currenttransaction":2,"row":{"id":3,"value":"newvalue_3"}} {"operation":0,"originaltransaction":2,"bucket":536870914,"rowid":0,"currenttransaction":2,"row":{"id":6,"value":"value_6"}} {"operation":0,"originaltransaction":3,"bucket":536870913,"rowid":0,"currenttransaction":3,"row":{"id":1,"value":"newestvalue_1"}} {"operation":0,"originaltransaction":3,"bucket":536870913,"rowid":1,"currenttransaction":3,"row":{"id":2,"value":"newestvalue_2"}} {"operation":0,"originaltransaction":3,"bucket":536870913,"rowid":2,"currenttransaction":3,"row":{"id":5,"value":"newestvalue_5"}} {"operation":0,"originaltransaction":3,"bucket":536870913,"rowid":3,"currenttransaction":3,"row":{"id":7,"value":"newestvalue_7"}} {"operation":0,"originaltransaction":3,"bucket":536870914,"rowid":0,"currenttransaction":3,"row":{"id":8,"value":"value_8"}} ``` Order seems to be valid. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 695795) Time Spent: 40m (was: 0.5h) > DEFAULT keyword handling in MERGE UPDATE clause issues > -- > > Key: HIVE-21172 > URL: https://issues.apache.org/jira/browse/HIVE-21172 > Project: Hive > Issue Type: Sub-task > Components: SQL, Transactions >Affects Versions: 4.0.0 >Reporter: Eugene Koifman >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > once HIVE-21159 lands, enable {{HiveConf.MERGE_SPLIT_UPDATE}} and run these > tests. > TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge_stats] > mvn test -Dtest=TestMiniLlapLocalCliDriver > -Dqfile=insert_into_default_keyword.q > Merge is rewritten as a multi-insert. When Update clause has
[jira] [Work logged] (HIVE-21172) DEFAULT keyword handling in MERGE UPDATE clause issues
[ https://issues.apache.org/jira/browse/HIVE-21172?focusedWorklogId=695748=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695748 ] ASF GitHub Bot logged work on HIVE-21172: - Author: ASF GitHub Bot Created on: 14/Dec/21 13:16 Start Date: 14/Dec/21 13:16 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2857: URL: https://github.com/apache/hive/pull/2857#discussion_r768652584 ## File path: ql/src/test/results/clientpositive/llap/masking_acid_no_masking.q.out ## @@ -54,8 +53,9 @@ POSTHOOK: Input: default@masking_acid_no_masking POSTHOOK: Input: default@nonacid_n0 POSTHOOK: Output: default@masking_acid_no_masking POSTHOOK: Output: default@masking_acid_no_masking -POSTHOOK: Output: default@masking_acid_no_masking POSTHOOK: Output: default@merge_tmp_table POSTHOOK: Lineage: masking_acid_no_masking.key SIMPLE [(nonacid_n0)s.FieldSchema(name:key, type:int, comment:null), ] +POSTHOOK: Lineage: masking_acid_no_masking.key SIMPLE [(nonacid_n0)s.FieldSchema(name:key, type:int, comment:null), ] Review comment: These lineages are generated by the MoveTask when inserting. By turning on `hive.merge.split.update` update branch of merge statements are splitted into a insert and a delete branch. Originally this merge had only one insert branch but now it has two to the same table same columns: - one for the insert branch - one for the update branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 695748) Time Spent: 0.5h (was: 20m) > DEFAULT keyword handling in MERGE UPDATE clause issues > -- > > Key: HIVE-21172 > URL: https://issues.apache.org/jira/browse/HIVE-21172 > Project: Hive > Issue Type: Sub-task > Components: SQL, Transactions >Affects Versions: 4.0.0 >Reporter: Eugene Koifman >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > once HIVE-21159 lands, enable {{HiveConf.MERGE_SPLIT_UPDATE}} and run these > tests. > TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge_stats] > mvn test -Dtest=TestMiniLlapLocalCliDriver > -Dqfile=insert_into_default_keyword.q > Merge is rewritten as a multi-insert. When Update clause has DEFAULT, it's > not properly replaced with a value in the muli-insert - it's treated as a > literal > {noformat} > INSERT INTO `default`.`acidTable`-- update clause(insert part) > SELECT `t`.`key`, `DEFAULT`, `t`.`value` >WHERE `t`.`key` = `s`.`key` AND `s`.`key` > 3 AND NOT(`s`.`key` < 3) > {noformat} > See {{LOG.info("Going to reparse <" + originalQuery + "> as \n<" + > rewrittenQueryStr.toString() + ">");}} in hive.log > {{MergeSemanticAnalyzer.replaceDefaultKeywordForMerge()}} is only called in > {{handleInsert}} but not {{handleUpdate()}}. Why does issue only show up with > {{MERGE_SPLIT_UPDATE}}? > Once this is fixed, HiveConf.MERGE_SPLIT_UPDATE should be true by default -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25804) Update log4j2 version to 2.16.0 to incorporate further CVE-2021-44228 hardening
[ https://issues.apache.org/jira/browse/HIVE-25804?focusedWorklogId=695723=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695723 ] ASF GitHub Bot logged work on HIVE-25804: - Author: ASF GitHub Bot Created on: 14/Dec/21 12:55 Start Date: 14/Dec/21 12:55 Worklog Time Spent: 10m Work Description: csjuhasz-c opened a new pull request #2874: URL: https://github.com/apache/hive/pull/2874 HIVE-25804: Update log4j2 version to 2.16.0 to incorporate further CVE-2021-44228 hardening ### What changes were proposed in this pull request? Update log4j version to 2.16.0 ### Why are the changes needed? To incorporate further changes related to CVE-2021-44228. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 695723) Time Spent: 20m (was: 10m) > Update log4j2 version to 2.16.0 to incorporate further CVE-2021-44228 > hardening > --- > > Key: HIVE-25804 > URL: https://issues.apache.org/jira/browse/HIVE-25804 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Csaba Juhász >Assignee: Csaba Juhász >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > https://lists.apache.org/thread/d6v4r6nosxysyq9rvnr779336yf0woz4 > https://logging.apache.org/log4j/2.x/changes-report.html#a2.16.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25804) Update log4j2 version to 2.16.0 to incorporate further CVE-2021-44228 hardening
[ https://issues.apache.org/jira/browse/HIVE-25804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25804: -- Labels: pull-request-available (was: ) > Update log4j2 version to 2.16.0 to incorporate further CVE-2021-44228 > hardening > --- > > Key: HIVE-25804 > URL: https://issues.apache.org/jira/browse/HIVE-25804 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Csaba Juhász >Assignee: Csaba Juhász >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > https://lists.apache.org/thread/d6v4r6nosxysyq9rvnr779336yf0woz4 > https://logging.apache.org/log4j/2.x/changes-report.html#a2.16.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25804) Update log4j2 version to 2.16.0 to incorporate further CVE-2021-44228 hardening
[ https://issues.apache.org/jira/browse/HIVE-25804?focusedWorklogId=695722=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695722 ] ASF GitHub Bot logged work on HIVE-25804: - Author: ASF GitHub Bot Created on: 14/Dec/21 12:53 Start Date: 14/Dec/21 12:53 Worklog Time Spent: 10m Work Description: csjuhasz-c closed pull request #2871: URL: https://github.com/apache/hive/pull/2871 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 695722) Remaining Estimate: 0h Time Spent: 10m > Update log4j2 version to 2.16.0 to incorporate further CVE-2021-44228 > hardening > --- > > Key: HIVE-25804 > URL: https://issues.apache.org/jira/browse/HIVE-25804 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Csaba Juhász >Assignee: Csaba Juhász >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > https://lists.apache.org/thread/d6v4r6nosxysyq9rvnr779336yf0woz4 > https://logging.apache.org/log4j/2.x/changes-report.html#a2.16.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work started] (HIVE-25804) Update log4j2 version to 2.16.0 to incorporate further CVE-2021-44228 hardening
[ https://issues.apache.org/jira/browse/HIVE-25804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25804 started by Csaba Juhász. --- > Update log4j2 version to 2.16.0 to incorporate further CVE-2021-44228 > hardening > --- > > Key: HIVE-25804 > URL: https://issues.apache.org/jira/browse/HIVE-25804 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Csaba Juhász >Assignee: Csaba Juhász >Priority: Major > > https://lists.apache.org/thread/d6v4r6nosxysyq9rvnr779336yf0woz4 > https://logging.apache.org/log4j/2.x/changes-report.html#a2.16.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25804) Update log4j2 version to 2.16.0 to incorporate further CVE-2021-44228 hardening
[ https://issues.apache.org/jira/browse/HIVE-25804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Csaba Juhász reassigned HIVE-25804: --- Assignee: Csaba Juhász > Update log4j2 version to 2.16.0 to incorporate further CVE-2021-44228 > hardening > --- > > Key: HIVE-25804 > URL: https://issues.apache.org/jira/browse/HIVE-25804 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Csaba Juhász >Assignee: Csaba Juhász >Priority: Major > > https://lists.apache.org/thread/d6v4r6nosxysyq9rvnr779336yf0woz4 > https://logging.apache.org/log4j/2.x/changes-report.html#a2.16.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25792) Multi Insert query fails on CBO path
[ https://issues.apache.org/jira/browse/HIVE-25792?focusedWorklogId=695672=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695672 ] ASF GitHub Bot logged work on HIVE-25792: - Author: ASF GitHub Bot Created on: 14/Dec/21 12:04 Start Date: 14/Dec/21 12:04 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2865: URL: https://github.com/apache/hive/pull/2865#discussion_r768598009 ## File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java ## @@ -190,52 +229,21 @@ public CommandProcessorResponse run() throws CommandProcessorException { } PlanMapper newPlanMapper = coreDriver.getPlanMapper(); - if (!explainReOptimization && !shouldReExecuteAfterCompile(oldPlanMapper, newPlanMapper)) { + if (!explainReOptimization && + !plugins.stream().anyMatch(p -> p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper))) { LOG.info("re-running the query would probably not yield better results; returning with last error"); // FIXME: retain old error; or create a new one? return cpr; } } } - private void afterExecute(PlanMapper planMapper, boolean success) { -for (IReExecutionPlugin p : plugins) { - p.afterExecute(planMapper, success); -} - } - - private boolean shouldReExecuteAfterCompile(PlanMapper oldPlanMapper, PlanMapper newPlanMapper) { -boolean ret = false; -for (IReExecutionPlugin p : plugins) { - boolean shouldReExecute = p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper); - LOG.debug("{}.shouldReExecuteAfterCompile = {}", p, shouldReExecute); - ret |= shouldReExecute; -} -return ret; - } - - private boolean shouldReExecute() { -boolean ret = false; -for (IReExecutionPlugin p : plugins) { - boolean shouldReExecute = p.shouldReExecute(executionIndex); - LOG.debug("{}.shouldReExecute = {}", p, shouldReExecute); Review comment: Same as above -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 695672) Time Spent: 2h 40m (was: 2.5h) > Multi Insert query fails on CBO path > - > > Key: HIVE-25792 > URL: https://issues.apache.org/jira/browse/HIVE-25792 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > {code} > set hive.cbo.enable=true; > drop table if exists aa1; > drop table if exists bb1; > drop table if exists cc1; > drop table if exists dd1; > drop table if exists ee1; > drop table if exists ff1; > create table aa1 ( stf_id string); > create table bb1 ( stf_id string); > create table cc1 ( stf_id string); > create table ff1 ( x string); > explain > from ff1 as a join cc1 as b > insert overwrite table aa1 select stf_id GROUP BY b.stf_id > insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id > ; > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25792) Multi Insert query fails on CBO path
[ https://issues.apache.org/jira/browse/HIVE-25792?focusedWorklogId=695671=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695671 ] ASF GitHub Bot logged work on HIVE-25792: - Author: ASF GitHub Bot Created on: 14/Dec/21 12:04 Start Date: 14/Dec/21 12:04 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2865: URL: https://github.com/apache/hive/pull/2865#discussion_r768597811 ## File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java ## @@ -190,52 +229,21 @@ public CommandProcessorResponse run() throws CommandProcessorException { } PlanMapper newPlanMapper = coreDriver.getPlanMapper(); - if (!explainReOptimization && !shouldReExecuteAfterCompile(oldPlanMapper, newPlanMapper)) { + if (!explainReOptimization && + !plugins.stream().anyMatch(p -> p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper))) { LOG.info("re-running the query would probably not yield better results; returning with last error"); // FIXME: retain old error; or create a new one? return cpr; } } } - private void afterExecute(PlanMapper planMapper, boolean success) { -for (IReExecutionPlugin p : plugins) { - p.afterExecute(planMapper, success); -} - } - - private boolean shouldReExecuteAfterCompile(PlanMapper oldPlanMapper, PlanMapper newPlanMapper) { -boolean ret = false; -for (IReExecutionPlugin p : plugins) { - boolean shouldReExecute = p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper); - LOG.debug("{}.shouldReExecuteAfterCompile = {}", p, shouldReExecute); Review comment: TBH, I am unsure here. We can keep: - `shouldReExecuteAfterCompile` - `shouldReExecute` - `shouldReCompile` Or, we can replace with a stream version: ``` plugins.stream() .peek(p -> LOG.debug("{}.shouldReCompile = {}", p)) .anyMatch(p -> p.shouldReCompile(currentIndex)) ``` Or we can omit the logs, and use only: ``` plugins.stream().anyMatch(p -> p.shouldReCompile(currentIndex)) ``` Your thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 695671) Time Spent: 2.5h (was: 2h 20m) > Multi Insert query fails on CBO path > - > > Key: HIVE-25792 > URL: https://issues.apache.org/jira/browse/HIVE-25792 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > {code} > set hive.cbo.enable=true; > drop table if exists aa1; > drop table if exists bb1; > drop table if exists cc1; > drop table if exists dd1; > drop table if exists ee1; > drop table if exists ff1; > create table aa1 ( stf_id string); > create table bb1 ( stf_id string); > create table cc1 ( stf_id string); > create table ff1 ( x string); > explain > from ff1 as a join cc1 as b > insert overwrite table aa1 select stf_id GROUP BY b.stf_id > insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id > ; > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25807) ok
[ https://issues.apache.org/jira/browse/HIVE-25807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Pawar resolved HIVE-25807. - Release Note: ok Resolution: Fixed > ok > -- > > Key: HIVE-25807 > URL: https://issues.apache.org/jira/browse/HIVE-25807 > Project: Hive > Issue Type: Bug >Reporter: Pravin Pawar >Assignee: Pravin Pawar >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-25807) ok
[ https://issues.apache.org/jira/browse/HIVE-25807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459109#comment-17459109 ] Pravin Pawar commented on HIVE-25807: - ok > ok > -- > > Key: HIVE-25807 > URL: https://issues.apache.org/jira/browse/HIVE-25807 > Project: Hive > Issue Type: Bug >Reporter: Pravin Pawar >Assignee: Pravin Pawar >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work started] (HIVE-25807) ok
[ https://issues.apache.org/jira/browse/HIVE-25807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25807 started by Pravin Pawar. --- > ok > -- > > Key: HIVE-25807 > URL: https://issues.apache.org/jira/browse/HIVE-25807 > Project: Hive > Issue Type: Bug >Reporter: Pravin Pawar >Assignee: Pravin Pawar >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25807) ok
[ https://issues.apache.org/jira/browse/HIVE-25807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Pawar reassigned HIVE-25807: --- Assignee: Pravin Pawar > ok > -- > > Key: HIVE-25807 > URL: https://issues.apache.org/jira/browse/HIVE-25807 > Project: Hive > Issue Type: Bug >Reporter: Pravin Pawar >Assignee: Pravin Pawar >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25792) Multi Insert query fails on CBO path
[ https://issues.apache.org/jira/browse/HIVE-25792?focusedWorklogId=695667=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695667 ] ASF GitHub Bot logged work on HIVE-25792: - Author: ASF GitHub Bot Created on: 14/Dec/21 11:57 Start Date: 14/Dec/21 11:57 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2865: URL: https://github.com/apache/hive/pull/2865#discussion_r768592941 ## File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java ## @@ -190,52 +229,21 @@ public CommandProcessorResponse run() throws CommandProcessorException { } PlanMapper newPlanMapper = coreDriver.getPlanMapper(); - if (!explainReOptimization && !shouldReExecuteAfterCompile(oldPlanMapper, newPlanMapper)) { + if (!explainReOptimization && + !plugins.stream().anyMatch(p -> p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper))) { LOG.info("re-running the query would probably not yield better results; returning with last error"); // FIXME: retain old error; or create a new one? return cpr; } } } - private void afterExecute(PlanMapper planMapper, boolean success) { -for (IReExecutionPlugin p : plugins) { - p.afterExecute(planMapper, success); -} - } - - private boolean shouldReExecuteAfterCompile(PlanMapper oldPlanMapper, PlanMapper newPlanMapper) { -boolean ret = false; -for (IReExecutionPlugin p : plugins) { - boolean shouldReExecute = p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper); Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 695667) Time Spent: 2h 20m (was: 2h 10m) > Multi Insert query fails on CBO path > - > > Key: HIVE-25792 > URL: https://issues.apache.org/jira/browse/HIVE-25792 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > {code} > set hive.cbo.enable=true; > drop table if exists aa1; > drop table if exists bb1; > drop table if exists cc1; > drop table if exists dd1; > drop table if exists ee1; > drop table if exists ff1; > create table aa1 ( stf_id string); > create table bb1 ( stf_id string); > create table cc1 ( stf_id string); > create table ff1 ( x string); > explain > from ff1 as a join cc1 as b > insert overwrite table aa1 select stf_id GROUP BY b.stf_id > insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id > ; > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25792) Multi Insert query fails on CBO path
[ https://issues.apache.org/jira/browse/HIVE-25792?focusedWorklogId=695666=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695666 ] ASF GitHub Bot logged work on HIVE-25792: - Author: ASF GitHub Bot Created on: 14/Dec/21 11:57 Start Date: 14/Dec/21 11:57 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2865: URL: https://github.com/apache/hive/pull/2865#discussion_r768592824 ## File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java ## @@ -167,20 +201,25 @@ public CommandProcessorResponse run() throws CommandProcessorException { } PlanMapper oldPlanMapper = coreDriver.getPlanMapper(); - afterExecute(oldPlanMapper, cpr != null); + final boolean success = cpr != null; + plugins.forEach(p -> p.afterExecute(oldPlanMapper, success)); + + // If the execution was successful return the result + if (success) { +return cpr; + } boolean shouldReExecute = explainReOptimization && executionIndex==1; - shouldReExecute |= cpr == null && shouldReExecute(); + shouldReExecute |= plugins.stream().anyMatch(p -> p.shouldReExecute(executionIndex)); - if (executionIndex >= maxExecutuions || !shouldReExecute) { -if (cpr != null) { - return cpr; -} else { - throw cpe; -} + if (executionIndex >= maxExecutions || !shouldReExecute) { +// If we do not have to reexecute, return the last error +throw cpe; } + LOG.info("Preparing to re-execute query"); - prepareToReExecute(); + plugins.forEach(IReExecutionPlugin::prepareToReExecute); + try { coreDriver.compileAndRespond(currentQuery); Review comment: Yeah, that's ok -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 695666) Time Spent: 2h 10m (was: 2h) > Multi Insert query fails on CBO path > - > > Key: HIVE-25792 > URL: https://issues.apache.org/jira/browse/HIVE-25792 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > {code} > set hive.cbo.enable=true; > drop table if exists aa1; > drop table if exists bb1; > drop table if exists cc1; > drop table if exists dd1; > drop table if exists ee1; > drop table if exists ff1; > create table aa1 ( stf_id string); > create table bb1 ( stf_id string); > create table cc1 ( stf_id string); > create table ff1 ( x string); > explain > from ff1 as a join cc1 as b > insert overwrite table aa1 select stf_id GROUP BY b.stf_id > insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id > ; > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25792) Multi Insert query fails on CBO path
[ https://issues.apache.org/jira/browse/HIVE-25792?focusedWorklogId=695664=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695664 ] ASF GitHub Bot logged work on HIVE-25792: - Author: ASF GitHub Bot Created on: 14/Dec/21 11:55 Start Date: 14/Dec/21 11:55 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2865: URL: https://github.com/apache/hive/pull/2865#discussion_r768591706 ## File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java ## @@ -167,20 +201,25 @@ public CommandProcessorResponse run() throws CommandProcessorException { } PlanMapper oldPlanMapper = coreDriver.getPlanMapper(); - afterExecute(oldPlanMapper, cpr != null); + final boolean success = cpr != null; + plugins.forEach(p -> p.afterExecute(oldPlanMapper, success)); + + // If the execution was successful return the result + if (success) { +return cpr; + } boolean shouldReExecute = explainReOptimization && executionIndex==1; - shouldReExecute |= cpr == null && shouldReExecute(); + shouldReExecute |= plugins.stream().anyMatch(p -> p.shouldReExecute(executionIndex)); - if (executionIndex >= maxExecutuions || !shouldReExecute) { -if (cpr != null) { - return cpr; -} else { - throw cpe; -} + if (executionIndex >= maxExecutions || !shouldReExecute) { +// If we do not have to reexecute, return the last error +throw cpe; } + LOG.info("Preparing to re-execute query"); - prepareToReExecute(); + plugins.forEach(IReExecutionPlugin::prepareToReExecute); Review comment: As discussed, most of them was just a loop, so I would keep this instead of having 6 methods for loops -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 695664) Time Spent: 2h (was: 1h 50m) > Multi Insert query fails on CBO path > - > > Key: HIVE-25792 > URL: https://issues.apache.org/jira/browse/HIVE-25792 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > {code} > set hive.cbo.enable=true; > drop table if exists aa1; > drop table if exists bb1; > drop table if exists cc1; > drop table if exists dd1; > drop table if exists ee1; > drop table if exists ff1; > create table aa1 ( stf_id string); > create table bb1 ( stf_id string); > create table cc1 ( stf_id string); > create table ff1 ( x string); > explain > from ff1 as a join cc1 as b > insert overwrite table aa1 select stf_id GROUP BY b.stf_id > insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id > ; > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25792) Multi Insert query fails on CBO path
[ https://issues.apache.org/jira/browse/HIVE-25792?focusedWorklogId=695663=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695663 ] ASF GitHub Bot logged work on HIVE-25792: - Author: ASF GitHub Bot Created on: 14/Dec/21 11:53 Start Date: 14/Dec/21 11:53 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2865: URL: https://github.com/apache/hive/pull/2865#discussion_r768590360 ## File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java ## @@ -115,14 +115,48 @@ public ReExecDriver(QueryState queryState, QueryInfo queryInfo, ArrayList Multi Insert query fails on CBO path > - > > Key: HIVE-25792 > URL: https://issues.apache.org/jira/browse/HIVE-25792 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > {code} > set hive.cbo.enable=true; > drop table if exists aa1; > drop table if exists bb1; > drop table if exists cc1; > drop table if exists dd1; > drop table if exists ee1; > drop table if exists ff1; > create table aa1 ( stf_id string); > create table bb1 ( stf_id string); > create table cc1 ( stf_id string); > create table ff1 ( x string); > explain > from ff1 as a join cc1 as b > insert overwrite table aa1 select stf_id GROUP BY b.stf_id > insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id > ; > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25792) Multi Insert query fails on CBO path
[ https://issues.apache.org/jira/browse/HIVE-25792?focusedWorklogId=695655=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695655 ] ASF GitHub Bot logged work on HIVE-25792: - Author: ASF GitHub Bot Created on: 14/Dec/21 11:32 Start Date: 14/Dec/21 11:32 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2865: URL: https://github.com/apache/hive/pull/2865#discussion_r768575045 ## File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/IReExecutionPlugin.java ## @@ -42,24 +42,72 @@ /** * Called before executing the query. */ - void beforeExecute(int executionIndex, boolean explainReOptimization); + default void beforeExecute(int executionIndex, boolean explainReOptimization) { +// default noop + } /** * The query have failed, does this plugin advises to re-execute it again? */ - boolean shouldReExecute(int executionNum); + default boolean shouldReExecute(int executionNum) { Review comment: We discussed, and renamed the other method -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 695655) Time Spent: 1h 40m (was: 1.5h) > Multi Insert query fails on CBO path > - > > Key: HIVE-25792 > URL: https://issues.apache.org/jira/browse/HIVE-25792 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > {code} > set hive.cbo.enable=true; > drop table if exists aa1; > drop table if exists bb1; > drop table if exists cc1; > drop table if exists dd1; > drop table if exists ee1; > drop table if exists ff1; > create table aa1 ( stf_id string); > create table bb1 ( stf_id string); > create table cc1 ( stf_id string); > create table ff1 ( x string); > explain > from ff1 as a join cc1 as b > insert overwrite table aa1 select stf_id GROUP BY b.stf_id > insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id > ; > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25792) Multi Insert query fails on CBO path
[ https://issues.apache.org/jira/browse/HIVE-25792?focusedWorklogId=695652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695652 ] ASF GitHub Bot logged work on HIVE-25792: - Author: ASF GitHub Bot Created on: 14/Dec/21 11:26 Start Date: 14/Dec/21 11:26 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2865: URL: https://github.com/apache/hive/pull/2865#discussion_r768571234 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -5536,10 +5536,12 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal HIVE_QUERY_REEXECUTION_ENABLED("hive.query.reexecution.enabled", true, "Enable query reexecutions"), -HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies", "overlay,reoptimize,reexecute_lost_am,dagsubmit", +HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies", +"overlay,reoptimize,reexecute_lost_am,dagsubmit,reexecute_cbo", "comma separated list of plugin can be used:\n" + " overlay: hiveconf subtree 'reexec.overlay' is used as an overlay in case of an execution errors out\n" + " reoptimize: collects operator statistics during execution and recompile the query after a failure\n" ++ " reexecute_cbo: reexecutes query after a CBO failure\n" Review comment: Renamed to `recompile_without_cbo` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 695652) Time Spent: 1.5h (was: 1h 20m) > Multi Insert query fails on CBO path > - > > Key: HIVE-25792 > URL: https://issues.apache.org/jira/browse/HIVE-25792 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > {code} > set hive.cbo.enable=true; > drop table if exists aa1; > drop table if exists bb1; > drop table if exists cc1; > drop table if exists dd1; > drop table if exists ee1; > drop table if exists ff1; > create table aa1 ( stf_id string); > create table bb1 ( stf_id string); > create table cc1 ( stf_id string); > create table ff1 ( x string); > explain > from ff1 as a join cc1 as b > insert overwrite table aa1 select stf_id GROUP BY b.stf_id > insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id > ; > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25806) Possible leak in LlapCacheAwareFs - Parquet, LLAP IO
[ https://issues.apache.org/jira/browse/HIVE-25806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25806: -- Labels: pull-request-available (was: ) > Possible leak in LlapCacheAwareFs - Parquet, LLAP IO > > > Key: HIVE-25806 > URL: https://issues.apache.org/jira/browse/HIVE-25806 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > there is an inputstream there which is never closed: > https://github.com/apache/hive/blob/9f9844dbc881e2a9267c259b8c04e7787f7fadc4/ql/src/java/org/apache/hadoop/hive/llap/LlapCacheAwareFs.java#L243 > my understanding is that in an InputStream chain, every InputStream is > responsible for closing its enclosed InputStream, here the chain is like: > DelegatingSeekableInputStream -> io.DataInputStream -> > LlapCacheAwareFs$CacheAwareInputStream -> io.DataInputStream -> > crypto.CryptoInputStream -> hdfs.DFSInputStream > {code} > at sun.nio.ch.SocketChannelImpl.(SocketChannelImpl.java:106) > at > sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60) > at java.nio.channels.SocketChannel.open(SocketChannel.java:145) > at > org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:62) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2933) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:821) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:746) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:379) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:575) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:836) > at > org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:183) > at java.io.DataInputStream.readFully(DataInputStream.java:195) > at > org.apache.hadoop.hive.llap.LlapCacheAwareFs$CacheAwareInputStream.read(LlapCacheAwareFs.java:264) > at java.io.DataInputStream.read(DataInputStream.java:149) > at > org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:102) > at > org.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127) > at > org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91) > at > org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:1174) > at > org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:805) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:429) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:407) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:359) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:93) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:361) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:117) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
[jira] [Work logged] (HIVE-25806) Possible leak in LlapCacheAwareFs - Parquet, LLAP IO
[ https://issues.apache.org/jira/browse/HIVE-25806?focusedWorklogId=695649=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695649 ] ASF GitHub Bot logged work on HIVE-25806: - Author: ASF GitHub Bot Created on: 14/Dec/21 11:25 Start Date: 14/Dec/21 11:25 Worklog Time Spent: 10m Work Description: abstractdog opened a new pull request #2873: URL: https://github.com/apache/hive/pull/2873 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 695649) Remaining Estimate: 0h Time Spent: 10m > Possible leak in LlapCacheAwareFs - Parquet, LLAP IO > > > Key: HIVE-25806 > URL: https://issues.apache.org/jira/browse/HIVE-25806 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > there is an inputstream there which is never closed: > https://github.com/apache/hive/blob/9f9844dbc881e2a9267c259b8c04e7787f7fadc4/ql/src/java/org/apache/hadoop/hive/llap/LlapCacheAwareFs.java#L243 > my understanding is that in an InputStream chain, every InputStream is > responsible for closing its enclosed InputStream, here the chain is like: > DelegatingSeekableInputStream -> io.DataInputStream -> > LlapCacheAwareFs$CacheAwareInputStream -> io.DataInputStream -> > crypto.CryptoInputStream -> hdfs.DFSInputStream > {code} > at sun.nio.ch.SocketChannelImpl.(SocketChannelImpl.java:106) > at > sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60) > at java.nio.channels.SocketChannel.open(SocketChannel.java:145) > at > org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:62) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2933) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:821) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:746) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:379) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:575) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:836) > at > org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:183) > at java.io.DataInputStream.readFully(DataInputStream.java:195) > at > org.apache.hadoop.hive.llap.LlapCacheAwareFs$CacheAwareInputStream.read(LlapCacheAwareFs.java:264) > at java.io.DataInputStream.read(DataInputStream.java:149) > at > org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:102) > at > org.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127) > at > org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91) > at > org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:1174) > at > org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:805) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:429) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:407) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:359) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:93) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:361) > at >
[jira] [Assigned] (HIVE-25783) Provide rat check to the CI
[ https://issues.apache.org/jira/browse/HIVE-25783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng reassigned HIVE-25783: -- Assignee: Zhihua Deng > Provide rat check to the CI > --- > > Key: HIVE-25783 > URL: https://issues.apache.org/jira/browse/HIVE-25783 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > > The Jira tries to investigate if we can provide rat check to the CI, make > sure that the newly added source files contain the ASF license information. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25806) Possible leak in LlapCacheAwareFs - Parquet, LLAP IO
[ https://issues.apache.org/jira/browse/HIVE-25806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-25806: Summary: Possible leak in LlapCacheAwareFs - Parquet, LLAP IO (was: Possible leak in LlapCacheAwareFs - parquet,llapio) > Possible leak in LlapCacheAwareFs - Parquet, LLAP IO > > > Key: HIVE-25806 > URL: https://issues.apache.org/jira/browse/HIVE-25806 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > there is an inputstream there which is never closed: > https://github.com/apache/hive/blob/9f9844dbc881e2a9267c259b8c04e7787f7fadc4/ql/src/java/org/apache/hadoop/hive/llap/LlapCacheAwareFs.java#L243 > my understanding is that in an InputStream chain, every InputStream is > responsible for closing its enclosed InputStream, here the chain is like: > DelegatingSeekableInputStream -> io.DataInputStream -> > LlapCacheAwareFs$CacheAwareInputStream -> io.DataInputStream -> > crypto.CryptoInputStream -> hdfs.DFSInputStream > {code} > at sun.nio.ch.SocketChannelImpl.(SocketChannelImpl.java:106) > at > sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60) > at java.nio.channels.SocketChannel.open(SocketChannel.java:145) > at > org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:62) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2933) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:821) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:746) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:379) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:575) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:836) > at > org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:183) > at java.io.DataInputStream.readFully(DataInputStream.java:195) > at > org.apache.hadoop.hive.llap.LlapCacheAwareFs$CacheAwareInputStream.read(LlapCacheAwareFs.java:264) > at java.io.DataInputStream.read(DataInputStream.java:149) > at > org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:102) > at > org.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127) > at > org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91) > at > org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:1174) > at > org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:805) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:429) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:407) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:359) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:93) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:361) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:117) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at >
[jira] [Updated] (HIVE-25806) Possible leak in LlapCacheAwareFs - parquet,llapio
[ https://issues.apache.org/jira/browse/HIVE-25806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-25806: Description: there is an inputstream there which is never closed: https://github.com/apache/hive/blob/9f9844dbc881e2a9267c259b8c04e7787f7fadc4/ql/src/java/org/apache/hadoop/hive/llap/LlapCacheAwareFs.java#L243 my understanding is that in an InputStream chain, every InputStream is responsible for closing its enclosed InputStream, here the chain is like: DelegatingSeekableInputStream -> io.DataInputStream -> LlapCacheAwareFs$CacheAwareInputStream -> io.DataInputStream -> crypto.CryptoInputStream -> hdfs.DFSInputStream {code} at sun.nio.ch.SocketChannelImpl.(SocketChannelImpl.java:106) at sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60) at java.nio.channels.SocketChannel.open(SocketChannel.java:145) at org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:62) at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2933) at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:821) at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:746) at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:379) at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:575) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:836) at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:183) at java.io.DataInputStream.readFully(DataInputStream.java:195) at org.apache.hadoop.hive.llap.LlapCacheAwareFs$CacheAwareInputStream.read(LlapCacheAwareFs.java:264) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:102) at org.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127) at org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91) at org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:1174) at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:805) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:429) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:407) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:359) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:93) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:361) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:117) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at
[jira] [Assigned] (HIVE-25806) Possible leak in LlapCacheAwareFs - parquet,llapio
[ https://issues.apache.org/jira/browse/HIVE-25806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor reassigned HIVE-25806: --- Assignee: László Bodor > Possible leak in LlapCacheAwareFs - parquet,llapio > -- > > Key: HIVE-25806 > URL: https://issues.apache.org/jira/browse/HIVE-25806 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25805) Wrong result when rebuilding MV with count(col) incrementally
[ https://issues.apache.org/jira/browse/HIVE-25805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25805: -- Labels: pull-request-available (was: ) > Wrong result when rebuilding MV with count(col) incrementally > - > > Key: HIVE-25805 > URL: https://issues.apache.org/jira/browse/HIVE-25805 > Project: Hive > Issue Type: Bug > Components: CBO, Materialized views >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {code:java} > create table t1(a char(15), b int) stored as orc TBLPROPERTIES > ('transactional'='true'); > insert into t1(a, b) values ('old', 1); > create materialized view mat1 stored as orc TBLPROPERTIES > ('transactional'='true') as > select t1.a, count(t1.b), count(*) from t1 group by t1.a; > delete from t1 where b = 1; > insert into t1(a,b) values > ('new', null); > alter materialized view mat1 rebuild; > select * from mat1; > {code} > returns > {code:java} > new 1 1 > {code} > but, should be > {code:java} > new 0 1 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25805) Wrong result when rebuilding MV with count(col) incrementally
[ https://issues.apache.org/jira/browse/HIVE-25805?focusedWorklogId=695631=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695631 ] ASF GitHub Bot logged work on HIVE-25805: - Author: ASF GitHub Bot Created on: 14/Dec/21 10:47 Start Date: 14/Dec/21 10:47 Worklog Time Spent: 10m Work Description: kasakrisz opened a new pull request #2872: URL: https://github.com/apache/hive/pull/2872 ### What changes were proposed in this pull request? When generating incremental rebuild plan for MVs having aggregate and delete operations in any source tables check if the view definition contains any `count` aggregate function which has argument. If it has add expression to check if that argument is `null` or not. ### Why are the changes needed? Records with `null` values should not be counted in the final aggregation. ### Does this PR introduce _any_ user-facing change? Yes. This patch fixes a data correctness issue. ### How was this patch tested? ``` mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=materialized_view_create_rewrite_6.q -pl itests/qtest -Pitests ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 695631) Remaining Estimate: 0h Time Spent: 10m > Wrong result when rebuilding MV with count(col) incrementally > - > > Key: HIVE-25805 > URL: https://issues.apache.org/jira/browse/HIVE-25805 > Project: Hive > Issue Type: Bug > Components: CBO, Materialized views >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {code:java} > create table t1(a char(15), b int) stored as orc TBLPROPERTIES > ('transactional'='true'); > insert into t1(a, b) values ('old', 1); > create materialized view mat1 stored as orc TBLPROPERTIES > ('transactional'='true') as > select t1.a, count(t1.b), count(*) from t1 group by t1.a; > delete from t1 where b = 1; > insert into t1(a,b) values > ('new', null); > alter materialized view mat1 rebuild; > select * from mat1; > {code} > returns > {code:java} > new 1 1 > {code} > but, should be > {code:java} > new 0 1 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25805) Wrong result when rebuilding MV with count(col) incrementally
[ https://issues.apache.org/jira/browse/HIVE-25805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-25805: -- Summary: Wrong result when rebuilding MV with count(col) incrementally (was: Wrong result when rebuilding MV with count(col) incremental) > Wrong result when rebuilding MV with count(col) incrementally > - > > Key: HIVE-25805 > URL: https://issues.apache.org/jira/browse/HIVE-25805 > Project: Hive > Issue Type: Bug > Components: CBO, Materialized views >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > > {code:java} > create table t1(a char(15), b int) stored as orc TBLPROPERTIES > ('transactional'='true'); > insert into t1(a, b) values ('old', 1); > create materialized view mat1 stored as orc TBLPROPERTIES > ('transactional'='true') as > select t1.a, count(t1.b), count(*) from t1 group by t1.a; > delete from t1 where b = 1; > insert into t1(a,b) values > ('new', null); > alter materialized view mat1 rebuild; > select * from mat1; > {code} > returns > {code:java} > new 1 1 > {code} > but, should be > {code:java} > new 0 1 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25805) Wrong result when rebuilding MV with count(col) incremental
[ https://issues.apache.org/jira/browse/HIVE-25805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa reassigned HIVE-25805: - > Wrong result when rebuilding MV with count(col) incremental > --- > > Key: HIVE-25805 > URL: https://issues.apache.org/jira/browse/HIVE-25805 > Project: Hive > Issue Type: Bug > Components: CBO, Materialized views >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > > {code:java} > create table t1(a char(15), b int) stored as orc TBLPROPERTIES > ('transactional'='true'); > insert into t1(a, b) values ('old', 1); > create materialized view mat1 stored as orc TBLPROPERTIES > ('transactional'='true') as > select t1.a, count(t1.b), count(*) from t1 group by t1.a; > delete from t1 where b = 1; > insert into t1(a,b) values > ('new', null); > alter materialized view mat1 rebuild; > select * from mat1; > {code} > returns > {code:java} > new 1 1 > {code} > but, should be > {code:java} > new 0 1 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-21172) DEFAULT keyword handling in MERGE UPDATE clause issues
[ https://issues.apache.org/jira/browse/HIVE-21172?focusedWorklogId=695610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695610 ] ASF GitHub Bot logged work on HIVE-21172: - Author: ASF GitHub Bot Created on: 14/Dec/21 09:56 Start Date: 14/Dec/21 09:56 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #2857: URL: https://github.com/apache/hive/pull/2857#discussion_r768476035 ## File path: ql/src/test/results/clientpositive/llap/masking_acid_no_masking.q.out ## @@ -54,8 +53,9 @@ POSTHOOK: Input: default@masking_acid_no_masking POSTHOOK: Input: default@nonacid_n0 POSTHOOK: Output: default@masking_acid_no_masking POSTHOOK: Output: default@masking_acid_no_masking -POSTHOOK: Output: default@masking_acid_no_masking POSTHOOK: Output: default@merge_tmp_table POSTHOOK: Lineage: masking_acid_no_masking.key SIMPLE [(nonacid_n0)s.FieldSchema(name:key, type:int, comment:null), ] +POSTHOOK: Lineage: masking_acid_no_masking.key SIMPLE [(nonacid_n0)s.FieldSchema(name:key, type:int, comment:null), ] Review comment: is this a duplicate? ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java ## @@ -1711,13 +1713,13 @@ public void testMajorCompactionAfterTwoMergeStatements() throws Exception { // Verify contents of bucket files. List expectedRsBucket0 = Arrays.asList("{\"writeid\":1,\"bucketid\":536870912,\"rowid\":3}\t4\tvalue_4", -"{\"writeid\":2,\"bucketid\":536870912,\"rowid\":0}\t6\tvalue_6", -"{\"writeid\":2,\"bucketid\":536870913,\"rowid\":2}\t3\tnewvalue_3", -"{\"writeid\":3,\"bucketid\":536870912,\"rowid\":0}\t8\tvalue_8", -"{\"writeid\":3,\"bucketid\":536870913,\"rowid\":0}\t5\tnewestvalue_5", -"{\"writeid\":3,\"bucketid\":536870913,\"rowid\":1}\t7\tnewestvalue_7", -"{\"writeid\":3,\"bucketid\":536870913,\"rowid\":2}\t1\tnewestvalue_1", - "{\"writeid\":3,\"bucketid\":536870913,\"rowid\":3}\t2\tnewestvalue_2"); +"{\"writeid\":2,\"bucketid\":536870913,\"rowid\":2}\t3\tnewvalue_3", Review comment: seeing a change like this I wonder how much value this test adds... ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/MergeSemanticAnalyzer.java ## @@ -441,6 +442,11 @@ private String handleUpdate(ASTNode whenMatchedUpdateClause, StringBuilder rewri default: //do nothing } + +if ("`default`".equalsIgnoreCase(rhsExp.trim())) { + rhsExp = MapUtils.getString(colNameToDefaultConstraint, name, "null"); Review comment: iiuc this makes changes the column value to the default if we see the "default" in the query I see that this also work for a plain insert: ``` create table q2(a string default 'asd'); insert into q2 values(`default`) select * from q2; ``` however I think the standard suggest to use the `DEFAULT` keyword and not as a string literal; in Hive we seem to also "support" the "default" as a string literal to be interpreted as default. I think because of various reasons - the default keyword becomes the string default at some point and it works like that right now. could you open a follow-up to fix the `default` literal's handling? ## File path: ql/src/test/results/clientpositive/llap/explain_locks.q.out ## @@ -233,20 +221,14 @@ POSTHOOK: Input: default@target@p=2/q=2 POSTHOOK: Output: default@merge_tmp_table POSTHOOK: Output: default@target POSTHOOK: Output: default@target@p=1/q=2 -POSTHOOK: Output: default@target@p=1/q=2 -POSTHOOK: Output: default@target@p=1/q=3 POSTHOOK: Output: default@target@p=1/q=3 POSTHOOK: Output: default@target@p=2/q=2 -POSTHOOK: Output: default@target@p=2/q=2 LOCK INFORMATION: default.source -> SHARED_READ default.target.p=1/q=2 -> SHARED_READ default.target.p=1/q=3 -> SHARED_READ default.target.p=2/q=2 -> SHARED_READ default.target.p=2/q=2 -> SHARED_WRITE -default.target.p=2/q=2 -> SHARED_WRITE Review comment: I wonder why were these lock duplicated? ## File path: ql/src/test/results/clientpositive/llap/acid_direct_update_delete_with_merge.q.out ## @@ -112,11 +110,13 @@ POSTHOOK: Input: default@transactions@tran_date=20170413 POSTHOOK: Output: default@merge_tmp_table POSTHOOK: Output: default@transactions POSTHOOK: Output: default@transactions@tran_date=20170410 -POSTHOOK: Output: default@transactions@tran_date=20170410 POSTHOOK: Output: default@transactions@tran_date=20170413 POSTHOOK: Output: default@transactions@tran_date=20170413 POSTHOOK: Output: default@transactions@tran_date=20170415 POSTHOOK: Lineage: merge_tmp_table.val EXPRESSION [(transactions)t.FieldSchema(name:ROW__ID, type:struct, comment:), (transactions)t.FieldSchema(name:tran_date,
[jira] [Work logged] (HIVE-25576) Add config to parse date with older date format
[ https://issues.apache.org/jira/browse/HIVE-25576?focusedWorklogId=695599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-695599 ] ASF GitHub Bot logged work on HIVE-25576: - Author: ASF GitHub Bot Created on: 14/Dec/21 09:34 Start Date: 14/Dec/21 09:34 Worklog Time Spent: 10m Work Description: zabetak commented on pull request #2690: URL: https://github.com/apache/hive/pull/2690#issuecomment-993349113 Apologies for the delay @ashish-kumar-sharma , I will try to rearrange this on my TODO list. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 695599) Time Spent: 2h 10m (was: 2h) > Add config to parse date with older date format > --- > > Key: HIVE-25576 > URL: https://issues.apache.org/jira/browse/HIVE-25576 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2, 4.0.0 >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > *History* > *Hive 1.2* - > VM time zone set to Asia/Bangkok > *Query* - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00 > UTC','-MM-dd HH:mm:ss z')); > *Result* - 1800-01-01 07:00:00 > *Implementation details* - > SimpleDateFormat formatter = new SimpleDateFormat(pattern); > Long unixtime = formatter.parse(textval).getTime() / 1000; > Date date = new Date(unixtime * 1000L); > https://docs.oracle.com/javase/8/docs/api/java/util/Date.html . In official > documentation they have mention that "Unfortunately, the API for these > functions was not amenable to internationalization and The corresponding > methods in Date are deprecated" . Due to that this is producing wrong result > *Master branch* - > set hive.local.time.zone=Asia/Bangkok; > *Query* - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00 > UTC','-MM-dd HH:mm:ss z')); > *Result* - 1800-01-01 06:42:04 > *Implementation details* - > DateTimeFormatter dtformatter = new DateTimeFormatterBuilder() > .parseCaseInsensitive() > .appendPattern(pattern) > .toFormatter(); > ZonedDateTime zonedDateTime = > ZonedDateTime.parse(textval,dtformatter).withZoneSameInstant(ZoneId.of(timezone)); > Long dttime = zonedDateTime.toInstant().getEpochSecond(); > *Problem*- > Now *SimpleDateFormat* has been replaced with *DateTimeFormatter* which is > giving the correct result but it is not backword compatible. Which is causing > issue at time for migration to new version. Because the older data written is > using Hive 1.x or 2.x is not compatible with *DateTimeFormatter*. > *Solution* > Introduce an config "hive.legacy.timeParserPolicy" with following values - > 1. *True*- use *SimpleDateFormat* > 2. *False* - use *DateTimeFormatter* > Note: apache spark also face the same issue > https://issues.apache.org/jira/browse/SPARK-30668 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-25783) Provide rat check to the CI
[ https://issues.apache.org/jira/browse/HIVE-25783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459000#comment-17459000 ] Zhihua Deng commented on HIVE-25783: I will take a look. Thank you for the information. > Provide rat check to the CI > --- > > Key: HIVE-25783 > URL: https://issues.apache.org/jira/browse/HIVE-25783 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Reporter: Zhihua Deng >Priority: Major > > The Jira tries to investigate if we can provide rat check to the CI, make > sure that the newly added source files contain the ASF license information. -- This message was sent by Atlassian Jira (v8.20.1#820001)