[jira] [Work logged] (HIVE-23814) Clean up Driver
[ https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457524&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457524 ] ASF GitHub Bot logged work on HIVE-23814: - Author: ASF GitHub Bot Created on: 11/Jul/20 05:09 Start Date: 11/Jul/20 05:09 Worklog Time Spent: 10m Work Description: pvary commented on pull request #1222: URL: https://github.com/apache/hive/pull/1222#issuecomment-656991606 I am not entirely comfortable with my knowledge around the area, but tried to fo my best when reviewing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457524) Time Spent: 2h 20m (was: 2h 10m) > Clean up Driver > --- > > Key: HIVE-23814 > URL: https://issues.apache.org/jira/browse/HIVE-23814 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > Driver is now cut down to it's minimal size by extracting all of it's sub > tasks to separate classes. The rest should be cleaned up by > * moving out some smaller parts of the code to sub task and utility classes > wherever it is still possible > * cut large functions to meaningful and manageable parts > * re-order the functions to follow the order of processing > * fix checkstyle issues > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23814) Clean up Driver
[ https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457523&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457523 ] ASF GitHub Bot logged work on HIVE-23814: - Author: ASF GitHub Bot Created on: 11/Jul/20 05:04 Start Date: 11/Jul/20 05:04 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1222: URL: https://github.com/apache/hive/pull/1222#discussion_r453156405 ## File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java ## @@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo queryInfo, HiveTxnManager txnMana driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState); } - /** - * Compile a new query, but potentially reset taskID counter. Not resetting task counter - * is useful for generating re-entrant QL queries. - * @param command The HiveQL query to compile - * @param resetTaskIds Resets taskID counter if true. - * @return 0 for ok - */ - public int compile(String command, boolean resetTaskIds) { -try { - compile(command, resetTaskIds, false); - return 0; -} catch (CommandProcessorException cpr) { - return cpr.getErrorCode(); -} + @Override + public Context getContext() { +return context; } - // deferClose indicates if the close/destroy should be deferred when the process has been - // interrupted, it should be set to true if the compile is called within another method like - // runInternal, which defers the close to the called in that method. - @VisibleForTesting - public void compile(String command, boolean resetTaskIds, boolean deferClose) throws CommandProcessorException { -preparForCompile(resetTaskIds); - -Compiler compiler = new Compiler(context, driverContext, driverState); -QueryPlan plan = compiler.compile(command, deferClose); -driverContext.setPlan(plan); - -compileFinished(deferClose); + @Override + public HiveConf getConf() { +return driverContext.getConf(); } - private void compileFinished(boolean deferClose) { -if (DriverState.getDriverState().isAborted() && !deferClose) { - closeInProcess(true); -} + @Override + public CommandProcessorResponse run() throws CommandProcessorException { +return run(null, true); } - private void preparForCompile(boolean resetTaskIds) throws CommandProcessorException { -driverTxnHandler.createTxnManager(); -DriverState.setDriverState(driverState); -prepareContext(); -setQueryId(); Review comment: Won't we miss setting the query id? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457523) Time Spent: 2h 10m (was: 2h) > Clean up Driver > --- > > Key: HIVE-23814 > URL: https://issues.apache.org/jira/browse/HIVE-23814 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > Driver is now cut down to it's minimal size by extracting all of it's sub > tasks to separate classes. The rest should be cleaned up by > * moving out some smaller parts of the code to sub task and utility classes > wherever it is still possible > * cut large functions to meaningful and manageable parts > * re-order the functions to follow the order of processing > * fix checkstyle issues > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23814) Clean up Driver
[ https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457522&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457522 ] ASF GitHub Bot logged work on HIVE-23814: - Author: ASF GitHub Bot Created on: 11/Jul/20 05:03 Start Date: 11/Jul/20 05:03 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1222: URL: https://github.com/apache/hive/pull/1222#discussion_r453156366 ## File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java ## @@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo queryInfo, HiveTxnManager txnMana driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState); } - /** - * Compile a new query, but potentially reset taskID counter. Not resetting task counter - * is useful for generating re-entrant QL queries. - * @param command The HiveQL query to compile - * @param resetTaskIds Resets taskID counter if true. - * @return 0 for ok - */ - public int compile(String command, boolean resetTaskIds) { -try { - compile(command, resetTaskIds, false); - return 0; -} catch (CommandProcessorException cpr) { - return cpr.getErrorCode(); -} + @Override + public Context getContext() { +return context; } - // deferClose indicates if the close/destroy should be deferred when the process has been - // interrupted, it should be set to true if the compile is called within another method like - // runInternal, which defers the close to the called in that method. - @VisibleForTesting - public void compile(String command, boolean resetTaskIds, boolean deferClose) throws CommandProcessorException { -preparForCompile(resetTaskIds); - -Compiler compiler = new Compiler(context, driverContext, driverState); -QueryPlan plan = compiler.compile(command, deferClose); -driverContext.setPlan(plan); - -compileFinished(deferClose); + @Override + public HiveConf getConf() { +return driverContext.getConf(); } - private void compileFinished(boolean deferClose) { -if (DriverState.getDriverState().isAborted() && !deferClose) { - closeInProcess(true); -} + @Override + public CommandProcessorResponse run() throws CommandProcessorException { +return run(null, true); } - private void preparForCompile(boolean resetTaskIds) throws CommandProcessorException { -driverTxnHandler.createTxnManager(); -DriverState.setDriverState(driverState); Review comment: Won't we miss this state setting? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457522) Time Spent: 2h (was: 1h 50m) > Clean up Driver > --- > > Key: HIVE-23814 > URL: https://issues.apache.org/jira/browse/HIVE-23814 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > Driver is now cut down to it's minimal size by extracting all of it's sub > tasks to separate classes. The rest should be cleaned up by > * moving out some smaller parts of the code to sub task and utility classes > wherever it is still possible > * cut large functions to meaningful and manageable parts > * re-order the functions to follow the order of processing > * fix checkstyle issues > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23814) Clean up Driver
[ https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457519&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457519 ] ASF GitHub Bot logged work on HIVE-23814: - Author: ASF GitHub Bot Created on: 11/Jul/20 04:56 Start Date: 11/Jul/20 04:56 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1222: URL: https://github.com/apache/hive/pull/1222#discussion_r453155831 ## File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java ## @@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo queryInfo, HiveTxnManager txnMana driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState); } - /** - * Compile a new query, but potentially reset taskID counter. Not resetting task counter - * is useful for generating re-entrant QL queries. - * @param command The HiveQL query to compile - * @param resetTaskIds Resets taskID counter if true. - * @return 0 for ok - */ - public int compile(String command, boolean resetTaskIds) { -try { - compile(command, resetTaskIds, false); - return 0; -} catch (CommandProcessorException cpr) { - return cpr.getErrorCode(); -} + @Override + public Context getContext() { +return context; } - // deferClose indicates if the close/destroy should be deferred when the process has been - // interrupted, it should be set to true if the compile is called within another method like - // runInternal, which defers the close to the called in that method. - @VisibleForTesting - public void compile(String command, boolean resetTaskIds, boolean deferClose) throws CommandProcessorException { -preparForCompile(resetTaskIds); - -Compiler compiler = new Compiler(context, driverContext, driverState); -QueryPlan plan = compiler.compile(command, deferClose); -driverContext.setPlan(plan); - -compileFinished(deferClose); + @Override + public HiveConf getConf() { +return driverContext.getConf(); } - private void compileFinished(boolean deferClose) { -if (DriverState.getDriverState().isAborted() && !deferClose) { - closeInProcess(true); -} + @Override + public CommandProcessorResponse run() throws CommandProcessorException { +return run(null, true); } - private void preparForCompile(boolean resetTaskIds) throws CommandProcessorException { -driverTxnHandler.createTxnManager(); -DriverState.setDriverState(driverState); -prepareContext(); -setQueryId(); + @Override + public CommandProcessorResponse run(String command) throws CommandProcessorException { +return run(command, false); + } -if (resetTaskIds) { - TaskFactory.resetId(); + private CommandProcessorResponse run(String command, boolean alreadyCompiled) throws CommandProcessorException { +try { + runInternal(command, alreadyCompiled); + return new CommandProcessorResponse(getSchema(), null); +} catch (CommandProcessorException cpe) { + processRunException(cpe); + throw cpe; } } - private void prepareContext() throws CommandProcessorException { -if (context != null && context.getExplainAnalyze() != AnalyzeState.RUNNING) { - // close the existing ctx etc before compiling a new query, but does not destroy driver - closeInProcess(false); -} + private void runInternal(String command, boolean alreadyCompiled) throws CommandProcessorException { +DriverState.setDriverState(driverState); +setInitialStateForRun(alreadyCompiled); +// a flag that helps to set the correct driver state in finally block by tracking if +// the method has been returned by an error or not. +boolean isFinishedWithError = true; try { - if (context == null) { -context = new Context(driverContext.getConf()); + HiveDriverRunHookContext hookContext = new HiveDriverRunHookContextImpl(driverContext.getConf(), + alreadyCompiled ? context.getCmd() : command); + runPreDriverHooks(hookContext); + + if (!alreadyCompiled) { +compileInternal(command, true); + } else { + driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime()); } -} catch (IOException e) { - throw new CommandProcessorException(e); -} -context.setHiveTxnManager(driverContext.getTxnManager()); -context.setStatsSource(driverContext.getStatsSource()); -context.setHDFSCleanup(true); + // Reset the PerfLogger so that it doesn't retain any previous values. + // Any value from compilation phase can be obtained through the map set in queryDisplay during compilation. + PerfLogger perfLogger = SessionState.getPerfLogger(true); -driverTxnHandler.setContext(context); - } + // the reason that we set the txn manager for the cxt h
[jira] [Work logged] (HIVE-23814) Clean up Driver
[ https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457516&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457516 ] ASF GitHub Bot logged work on HIVE-23814: - Author: ASF GitHub Bot Created on: 11/Jul/20 04:53 Start Date: 11/Jul/20 04:53 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1222: URL: https://github.com/apache/hive/pull/1222#discussion_r453155585 ## File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java ## @@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo queryInfo, HiveTxnManager txnMana driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState); } - /** - * Compile a new query, but potentially reset taskID counter. Not resetting task counter - * is useful for generating re-entrant QL queries. - * @param command The HiveQL query to compile - * @param resetTaskIds Resets taskID counter if true. - * @return 0 for ok - */ - public int compile(String command, boolean resetTaskIds) { -try { - compile(command, resetTaskIds, false); - return 0; -} catch (CommandProcessorException cpr) { - return cpr.getErrorCode(); -} + @Override + public Context getContext() { +return context; } - // deferClose indicates if the close/destroy should be deferred when the process has been - // interrupted, it should be set to true if the compile is called within another method like - // runInternal, which defers the close to the called in that method. - @VisibleForTesting - public void compile(String command, boolean resetTaskIds, boolean deferClose) throws CommandProcessorException { -preparForCompile(resetTaskIds); - -Compiler compiler = new Compiler(context, driverContext, driverState); -QueryPlan plan = compiler.compile(command, deferClose); -driverContext.setPlan(plan); - -compileFinished(deferClose); + @Override + public HiveConf getConf() { +return driverContext.getConf(); } - private void compileFinished(boolean deferClose) { -if (DriverState.getDriverState().isAborted() && !deferClose) { - closeInProcess(true); -} + @Override + public CommandProcessorResponse run() throws CommandProcessorException { +return run(null, true); } - private void preparForCompile(boolean resetTaskIds) throws CommandProcessorException { -driverTxnHandler.createTxnManager(); -DriverState.setDriverState(driverState); -prepareContext(); -setQueryId(); + @Override + public CommandProcessorResponse run(String command) throws CommandProcessorException { Review comment: Javadoc maybe here too, but at least it is easier to understand :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457516) Time Spent: 1h 40m (was: 1.5h) > Clean up Driver > --- > > Key: HIVE-23814 > URL: https://issues.apache.org/jira/browse/HIVE-23814 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Driver is now cut down to it's minimal size by extracting all of it's sub > tasks to separate classes. The rest should be cleaned up by > * moving out some smaller parts of the code to sub task and utility classes > wherever it is still possible > * cut large functions to meaningful and manageable parts > * re-order the functions to follow the order of processing > * fix checkstyle issues > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23814) Clean up Driver
[ https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457515&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457515 ] ASF GitHub Bot logged work on HIVE-23814: - Author: ASF GitHub Bot Created on: 11/Jul/20 04:52 Start Date: 11/Jul/20 04:52 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1222: URL: https://github.com/apache/hive/pull/1222#discussion_r453155521 ## File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java ## @@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo queryInfo, HiveTxnManager txnMana driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState); } - /** - * Compile a new query, but potentially reset taskID counter. Not resetting task counter - * is useful for generating re-entrant QL queries. - * @param command The HiveQL query to compile - * @param resetTaskIds Resets taskID counter if true. - * @return 0 for ok - */ - public int compile(String command, boolean resetTaskIds) { -try { - compile(command, resetTaskIds, false); - return 0; -} catch (CommandProcessorException cpr) { - return cpr.getErrorCode(); -} + @Override + public Context getContext() { +return context; } - // deferClose indicates if the close/destroy should be deferred when the process has been - // interrupted, it should be set to true if the compile is called within another method like - // runInternal, which defers the close to the called in that method. - @VisibleForTesting - public void compile(String command, boolean resetTaskIds, boolean deferClose) throws CommandProcessorException { -preparForCompile(resetTaskIds); - -Compiler compiler = new Compiler(context, driverContext, driverState); -QueryPlan plan = compiler.compile(command, deferClose); -driverContext.setPlan(plan); - -compileFinished(deferClose); + @Override + public HiveConf getConf() { +return driverContext.getConf(); } - private void compileFinished(boolean deferClose) { -if (DriverState.getDriverState().isAborted() && !deferClose) { - closeInProcess(true); -} + @Override + public CommandProcessorResponse run() throws CommandProcessorException { +return run(null, true); Review comment: What does this public method do? Javadoc might be useful This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457515) Time Spent: 1.5h (was: 1h 20m) > Clean up Driver > --- > > Key: HIVE-23814 > URL: https://issues.apache.org/jira/browse/HIVE-23814 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Driver is now cut down to it's minimal size by extracting all of it's sub > tasks to separate classes. The rest should be cleaned up by > * moving out some smaller parts of the code to sub task and utility classes > wherever it is still possible > * cut large functions to meaningful and manageable parts > * re-order the functions to follow the order of processing > * fix checkstyle issues > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23814) Clean up Driver
[ https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457514&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457514 ] ASF GitHub Bot logged work on HIVE-23814: - Author: ASF GitHub Bot Created on: 11/Jul/20 04:49 Start Date: 11/Jul/20 04:49 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1222: URL: https://github.com/apache/hive/pull/1222#discussion_r453155258 ## File path: ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java ## @@ -529,6 +529,34 @@ private void addTableFromEntity(Entity entity, Map tables) { .collect(Collectors.toList()); } + void rollback(CommandProcessorException cpe) throws CommandProcessorException { +try { + releaseLocksAndCommitOrRollback(false); +} catch (LockException e) { + LOG.error("rollback() FAILED: " + cpe); //make sure not to loose + DriverUtils.handleHiveException(driverContext, e, 12, "Additional info in hive.log at \"rollback() FAILED\""); +} + } + + void handleTransactionAfterExecution() throws CommandProcessorException { +try { + if (driverContext.getTxnManager().isImplicitTransactionOpen() || + driverContext.getPlan().getOperation() == HiveOperation.COMMIT) { +releaseLocksAndCommitOrRollback(true); + } else if (driverContext.getPlan().getOperation() == HiveOperation.ROLLBACK) { +releaseLocksAndCommitOrRollback(false); + } else if (!driverContext.getTxnManager().isTxnOpen() && + driverContext.getQueryState().getHiveOperation() == HiveOperation.REPLLOAD) { +// repl load during migration, commits the explicit txn and start some internal txns. Call +// releaseLocksAndCommitOrRollback to do the clean up. +releaseLocksAndCommitOrRollback(false); + } + // if none of the above is true, then txn (if there is one started) is not finished Review comment: How could this happen? Maybe at least a debug level log would be good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457514) Time Spent: 1h 20m (was: 1h 10m) > Clean up Driver > --- > > Key: HIVE-23814 > URL: https://issues.apache.org/jira/browse/HIVE-23814 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Driver is now cut down to it's minimal size by extracting all of it's sub > tasks to separate classes. The rest should be cleaned up by > * moving out some smaller parts of the code to sub task and utility classes > wherever it is still possible > * cut large functions to meaningful and manageable parts > * re-order the functions to follow the order of processing > * fix checkstyle issues > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23814) Clean up Driver
[ https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457513&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457513 ] ASF GitHub Bot logged work on HIVE-23814: - Author: ASF GitHub Bot Created on: 11/Jul/20 04:40 Start Date: 11/Jul/20 04:40 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1222: URL: https://github.com/apache/hive/pull/1222#discussion_r453154673 ## File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java ## @@ -410,260 +386,304 @@ private void compileInternal(String command, boolean deferClose) throws CommandP } } //Save compile-time PerfLogging for WebUI. -//Execution-time Perf logs are done by either another thread's PerfLogger -//or a reset PerfLogger. +//Execution-time Perf logs are done by either another thread's PerfLogger or a reset PerfLogger. driverContext.getQueryDisplay().setPerfLogStarts(QueryDisplay.Phase.COMPILATION, perfLogger.getStartTimes()); driverContext.getQueryDisplay().setPerfLogEnds(QueryDisplay.Phase.COMPILATION, perfLogger.getEndTimes()); } - private void runInternal(String command, boolean alreadyCompiled) throws CommandProcessorException { + /** + * Compile a new query, but potentially reset taskID counter. Not resetting task counter + * is useful for generating re-entrant QL queries. + * @param command The HiveQL query to compile + * @param resetTaskIds Resets taskID counter if true. + * @return 0 for ok + */ + public int compile(String command, boolean resetTaskIds) { +try { + compile(command, resetTaskIds, false); + return 0; +} catch (CommandProcessorException cpr) { + return cpr.getErrorCode(); +} + } + + // deferClose indicates if the close/destroy should be deferred when the process has been + // interrupted, it should be set to true if the compile is called within another method like + // runInternal, which defers the close to the called in that method. + @VisibleForTesting + public void compile(String command, boolean resetTaskIds, boolean deferClose) throws CommandProcessorException { +preparForCompile(resetTaskIds); + +Compiler compiler = new Compiler(context, driverContext, driverState); +QueryPlan plan = compiler.compile(command, deferClose); +driverContext.setPlan(plan); + +compileFinished(deferClose); + } + + private void preparForCompile(boolean resetTaskIds) throws CommandProcessorException { +driverTxnHandler.createTxnManager(); DriverState.setDriverState(driverState); +prepareContext(); +setQueryId(); -driverState.lock(); -try { - if (alreadyCompiled) { -if (driverState.isCompiled()) { - driverState.executing(); -} else { - String errorMessage = "FAILED: Precompiled query has been cancelled or closed."; - CONSOLE.printError(errorMessage); - throw DriverUtils.createProcessorException(driverContext, 12, errorMessage, null, null); -} - } else { -driverState.compiling(); - } -} finally { - driverState.unlock(); +if (resetTaskIds) { + TaskFactory.resetId(); +} + } + + private void prepareContext() throws CommandProcessorException { +if (context != null && context.getExplainAnalyze() != AnalyzeState.RUNNING) { + // close the existing ctx etc before compiling a new query, but does not destroy driver + closeInProcess(false); } -// a flag that helps to set the correct driver state in finally block by tracking if -// the method has been returned by an error or not. -boolean isFinishedWithError = true; try { - HiveDriverRunHookContext hookContext = new HiveDriverRunHookContextImpl(driverContext.getConf(), - alreadyCompiled ? context.getCmd() : command); - // Get all the driver run hooks and pre-execute them. - try { -driverContext.getHookRunner().runPreDriverHooks(hookContext); - } catch (Exception e) { -String errorMessage = "FAILED: Hive Internal Error: " + Utilities.getNameMessage(e); -CONSOLE.printError(errorMessage + "\n" + StringUtils.stringifyException(e)); -throw DriverUtils.createProcessorException(driverContext, 12, errorMessage, -ErrorMsg.findSQLState(e.getMessage()), e); + if (context == null) { +context = new Context(driverContext.getConf()); } +} catch (IOException e) { + throw new CommandProcessorException(e); +} - if (!alreadyCompiled) { -// compile internal will automatically reset the perf logger -compileInternal(command, true); - } else { -// Since we're reusing the compiled plan, we need to update its start time for current run - driverContext.getPlan().setQueryStartTime(driverConte
[jira] [Work logged] (HIVE-23814) Clean up Driver
[ https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457512&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457512 ] ASF GitHub Bot logged work on HIVE-23814: - Author: ASF GitHub Bot Created on: 11/Jul/20 04:34 Start Date: 11/Jul/20 04:34 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1222: URL: https://github.com/apache/hive/pull/1222#discussion_r453154237 ## File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java ## @@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo queryInfo, HiveTxnManager txnMana driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState); } - /** - * Compile a new query, but potentially reset taskID counter. Not resetting task counter - * is useful for generating re-entrant QL queries. - * @param command The HiveQL query to compile - * @param resetTaskIds Resets taskID counter if true. - * @return 0 for ok - */ - public int compile(String command, boolean resetTaskIds) { -try { - compile(command, resetTaskIds, false); - return 0; -} catch (CommandProcessorException cpr) { - return cpr.getErrorCode(); -} + @Override + public Context getContext() { +return context; } - // deferClose indicates if the close/destroy should be deferred when the process has been - // interrupted, it should be set to true if the compile is called within another method like - // runInternal, which defers the close to the called in that method. - @VisibleForTesting - public void compile(String command, boolean resetTaskIds, boolean deferClose) throws CommandProcessorException { -preparForCompile(resetTaskIds); - -Compiler compiler = new Compiler(context, driverContext, driverState); -QueryPlan plan = compiler.compile(command, deferClose); -driverContext.setPlan(plan); - -compileFinished(deferClose); + @Override + public HiveConf getConf() { +return driverContext.getConf(); } - private void compileFinished(boolean deferClose) { -if (DriverState.getDriverState().isAborted() && !deferClose) { - closeInProcess(true); -} + @Override + public CommandProcessorResponse run() throws CommandProcessorException { +return run(null, true); } - private void preparForCompile(boolean resetTaskIds) throws CommandProcessorException { -driverTxnHandler.createTxnManager(); -DriverState.setDriverState(driverState); -prepareContext(); -setQueryId(); + @Override + public CommandProcessorResponse run(String command) throws CommandProcessorException { +return run(command, false); + } -if (resetTaskIds) { - TaskFactory.resetId(); + private CommandProcessorResponse run(String command, boolean alreadyCompiled) throws CommandProcessorException { +try { + runInternal(command, alreadyCompiled); + return new CommandProcessorResponse(getSchema(), null); +} catch (CommandProcessorException cpe) { + processRunException(cpe); + throw cpe; } } - private void prepareContext() throws CommandProcessorException { -if (context != null && context.getExplainAnalyze() != AnalyzeState.RUNNING) { - // close the existing ctx etc before compiling a new query, but does not destroy driver - closeInProcess(false); -} + private void runInternal(String command, boolean alreadyCompiled) throws CommandProcessorException { +DriverState.setDriverState(driverState); +setInitialStateForRun(alreadyCompiled); +// a flag that helps to set the correct driver state in finally block by tracking if +// the method has been returned by an error or not. +boolean isFinishedWithError = true; try { - if (context == null) { -context = new Context(driverContext.getConf()); + HiveDriverRunHookContext hookContext = new HiveDriverRunHookContextImpl(driverContext.getConf(), + alreadyCompiled ? context.getCmd() : command); + runPreDriverHooks(hookContext); + + if (!alreadyCompiled) { +compileInternal(command, true); + } else { + driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime()); } -} catch (IOException e) { - throw new CommandProcessorException(e); -} -context.setHiveTxnManager(driverContext.getTxnManager()); -context.setStatsSource(driverContext.getStatsSource()); -context.setHDFSCleanup(true); + // Reset the PerfLogger so that it doesn't retain any previous values. + // Any value from compilation phase can be obtained through the map set in queryDisplay during compilation. + PerfLogger perfLogger = SessionState.getPerfLogger(true); -driverTxnHandler.setContext(context); - } + // the reason that we set the txn manager for the cxt h
[jira] [Work logged] (HIVE-23814) Clean up Driver
[ https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457510&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457510 ] ASF GitHub Bot logged work on HIVE-23814: - Author: ASF GitHub Bot Created on: 11/Jul/20 04:31 Start Date: 11/Jul/20 04:31 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1222: URL: https://github.com/apache/hive/pull/1222#discussion_r453154061 ## File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java ## @@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo queryInfo, HiveTxnManager txnMana driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState); } - /** - * Compile a new query, but potentially reset taskID counter. Not resetting task counter - * is useful for generating re-entrant QL queries. - * @param command The HiveQL query to compile - * @param resetTaskIds Resets taskID counter if true. - * @return 0 for ok - */ - public int compile(String command, boolean resetTaskIds) { -try { - compile(command, resetTaskIds, false); - return 0; -} catch (CommandProcessorException cpr) { - return cpr.getErrorCode(); -} + @Override + public Context getContext() { +return context; } - // deferClose indicates if the close/destroy should be deferred when the process has been - // interrupted, it should be set to true if the compile is called within another method like - // runInternal, which defers the close to the called in that method. - @VisibleForTesting - public void compile(String command, boolean resetTaskIds, boolean deferClose) throws CommandProcessorException { -preparForCompile(resetTaskIds); - -Compiler compiler = new Compiler(context, driverContext, driverState); -QueryPlan plan = compiler.compile(command, deferClose); -driverContext.setPlan(plan); - -compileFinished(deferClose); + @Override + public HiveConf getConf() { +return driverContext.getConf(); } - private void compileFinished(boolean deferClose) { -if (DriverState.getDriverState().isAborted() && !deferClose) { - closeInProcess(true); -} + @Override + public CommandProcessorResponse run() throws CommandProcessorException { +return run(null, true); } - private void preparForCompile(boolean resetTaskIds) throws CommandProcessorException { -driverTxnHandler.createTxnManager(); -DriverState.setDriverState(driverState); -prepareContext(); -setQueryId(); + @Override + public CommandProcessorResponse run(String command) throws CommandProcessorException { +return run(command, false); + } -if (resetTaskIds) { - TaskFactory.resetId(); + private CommandProcessorResponse run(String command, boolean alreadyCompiled) throws CommandProcessorException { +try { + runInternal(command, alreadyCompiled); + return new CommandProcessorResponse(getSchema(), null); +} catch (CommandProcessorException cpe) { + processRunException(cpe); + throw cpe; } } - private void prepareContext() throws CommandProcessorException { -if (context != null && context.getExplainAnalyze() != AnalyzeState.RUNNING) { - // close the existing ctx etc before compiling a new query, but does not destroy driver - closeInProcess(false); -} + private void runInternal(String command, boolean alreadyCompiled) throws CommandProcessorException { +DriverState.setDriverState(driverState); +setInitialStateForRun(alreadyCompiled); +// a flag that helps to set the correct driver state in finally block by tracking if +// the method has been returned by an error or not. +boolean isFinishedWithError = true; try { - if (context == null) { -context = new Context(driverContext.getConf()); + HiveDriverRunHookContext hookContext = new HiveDriverRunHookContextImpl(driverContext.getConf(), + alreadyCompiled ? context.getCmd() : command); + runPreDriverHooks(hookContext); + + if (!alreadyCompiled) { +compileInternal(command, true); + } else { + driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime()); } -} catch (IOException e) { - throw new CommandProcessorException(e); -} -context.setHiveTxnManager(driverContext.getTxnManager()); -context.setStatsSource(driverContext.getStatsSource()); -context.setHDFSCleanup(true); + // Reset the PerfLogger so that it doesn't retain any previous values. + // Any value from compilation phase can be obtained through the map set in queryDisplay during compilation. + PerfLogger perfLogger = SessionState.getPerfLogger(true); -driverTxnHandler.setContext(context); - } + // the reason that we set the txn manager for the cxt h
[jira] [Work logged] (HIVE-23814) Clean up Driver
[ https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457509&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457509 ] ASF GitHub Bot logged work on HIVE-23814: - Author: ASF GitHub Bot Created on: 11/Jul/20 04:28 Start Date: 11/Jul/20 04:28 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1222: URL: https://github.com/apache/hive/pull/1222#discussion_r453153797 ## File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java ## @@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo queryInfo, HiveTxnManager txnMana driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState); } - /** - * Compile a new query, but potentially reset taskID counter. Not resetting task counter - * is useful for generating re-entrant QL queries. - * @param command The HiveQL query to compile - * @param resetTaskIds Resets taskID counter if true. - * @return 0 for ok - */ - public int compile(String command, boolean resetTaskIds) { -try { - compile(command, resetTaskIds, false); - return 0; -} catch (CommandProcessorException cpr) { - return cpr.getErrorCode(); -} + @Override + public Context getContext() { +return context; } - // deferClose indicates if the close/destroy should be deferred when the process has been - // interrupted, it should be set to true if the compile is called within another method like - // runInternal, which defers the close to the called in that method. - @VisibleForTesting - public void compile(String command, boolean resetTaskIds, boolean deferClose) throws CommandProcessorException { -preparForCompile(resetTaskIds); - -Compiler compiler = new Compiler(context, driverContext, driverState); -QueryPlan plan = compiler.compile(command, deferClose); -driverContext.setPlan(plan); - -compileFinished(deferClose); + @Override + public HiveConf getConf() { +return driverContext.getConf(); } - private void compileFinished(boolean deferClose) { -if (DriverState.getDriverState().isAborted() && !deferClose) { - closeInProcess(true); -} + @Override + public CommandProcessorResponse run() throws CommandProcessorException { +return run(null, true); } - private void preparForCompile(boolean resetTaskIds) throws CommandProcessorException { -driverTxnHandler.createTxnManager(); -DriverState.setDriverState(driverState); -prepareContext(); -setQueryId(); + @Override + public CommandProcessorResponse run(String command) throws CommandProcessorException { +return run(command, false); + } -if (resetTaskIds) { - TaskFactory.resetId(); + private CommandProcessorResponse run(String command, boolean alreadyCompiled) throws CommandProcessorException { +try { + runInternal(command, alreadyCompiled); + return new CommandProcessorResponse(getSchema(), null); +} catch (CommandProcessorException cpe) { + processRunException(cpe); + throw cpe; } } - private void prepareContext() throws CommandProcessorException { -if (context != null && context.getExplainAnalyze() != AnalyzeState.RUNNING) { - // close the existing ctx etc before compiling a new query, but does not destroy driver - closeInProcess(false); -} + private void runInternal(String command, boolean alreadyCompiled) throws CommandProcessorException { +DriverState.setDriverState(driverState); +setInitialStateForRun(alreadyCompiled); +// a flag that helps to set the correct driver state in finally block by tracking if +// the method has been returned by an error or not. +boolean isFinishedWithError = true; try { - if (context == null) { -context = new Context(driverContext.getConf()); + HiveDriverRunHookContext hookContext = new HiveDriverRunHookContextImpl(driverContext.getConf(), + alreadyCompiled ? context.getCmd() : command); + runPreDriverHooks(hookContext); + + if (!alreadyCompiled) { +compileInternal(command, true); + } else { + driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime()); } -} catch (IOException e) { - throw new CommandProcessorException(e); -} -context.setHiveTxnManager(driverContext.getTxnManager()); -context.setStatsSource(driverContext.getStatsSource()); -context.setHDFSCleanup(true); + // Reset the PerfLogger so that it doesn't retain any previous values. + // Any value from compilation phase can be obtained through the map set in queryDisplay during compilation. + PerfLogger perfLogger = SessionState.getPerfLogger(true); -driverTxnHandler.setContext(context); - } + // the reason that we set the txn manager for the cxt h
[jira] [Work logged] (HIVE-23814) Clean up Driver
[ https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457508&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457508 ] ASF GitHub Bot logged work on HIVE-23814: - Author: ASF GitHub Bot Created on: 11/Jul/20 04:23 Start Date: 11/Jul/20 04:23 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1222: URL: https://github.com/apache/hive/pull/1222#discussion_r453153417 ## File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java ## @@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo queryInfo, HiveTxnManager txnMana driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState); } - /** - * Compile a new query, but potentially reset taskID counter. Not resetting task counter - * is useful for generating re-entrant QL queries. - * @param command The HiveQL query to compile - * @param resetTaskIds Resets taskID counter if true. - * @return 0 for ok - */ - public int compile(String command, boolean resetTaskIds) { -try { - compile(command, resetTaskIds, false); - return 0; -} catch (CommandProcessorException cpr) { - return cpr.getErrorCode(); -} + @Override + public Context getContext() { +return context; } - // deferClose indicates if the close/destroy should be deferred when the process has been - // interrupted, it should be set to true if the compile is called within another method like - // runInternal, which defers the close to the called in that method. - @VisibleForTesting - public void compile(String command, boolean resetTaskIds, boolean deferClose) throws CommandProcessorException { -preparForCompile(resetTaskIds); - -Compiler compiler = new Compiler(context, driverContext, driverState); -QueryPlan plan = compiler.compile(command, deferClose); -driverContext.setPlan(plan); - -compileFinished(deferClose); + @Override + public HiveConf getConf() { +return driverContext.getConf(); } - private void compileFinished(boolean deferClose) { -if (DriverState.getDriverState().isAborted() && !deferClose) { - closeInProcess(true); -} + @Override + public CommandProcessorResponse run() throws CommandProcessorException { +return run(null, true); } - private void preparForCompile(boolean resetTaskIds) throws CommandProcessorException { -driverTxnHandler.createTxnManager(); -DriverState.setDriverState(driverState); -prepareContext(); -setQueryId(); + @Override + public CommandProcessorResponse run(String command) throws CommandProcessorException { +return run(command, false); + } -if (resetTaskIds) { - TaskFactory.resetId(); + private CommandProcessorResponse run(String command, boolean alreadyCompiled) throws CommandProcessorException { +try { + runInternal(command, alreadyCompiled); + return new CommandProcessorResponse(getSchema(), null); +} catch (CommandProcessorException cpe) { + processRunException(cpe); + throw cpe; } } - private void prepareContext() throws CommandProcessorException { -if (context != null && context.getExplainAnalyze() != AnalyzeState.RUNNING) { - // close the existing ctx etc before compiling a new query, but does not destroy driver - closeInProcess(false); -} + private void runInternal(String command, boolean alreadyCompiled) throws CommandProcessorException { +DriverState.setDriverState(driverState); +setInitialStateForRun(alreadyCompiled); +// a flag that helps to set the correct driver state in finally block by tracking if +// the method has been returned by an error or not. +boolean isFinishedWithError = true; try { - if (context == null) { -context = new Context(driverContext.getConf()); + HiveDriverRunHookContext hookContext = new HiveDriverRunHookContextImpl(driverContext.getConf(), + alreadyCompiled ? context.getCmd() : command); + runPreDriverHooks(hookContext); + + if (!alreadyCompiled) { +compileInternal(command, true); + } else { + driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime()); } -} catch (IOException e) { - throw new CommandProcessorException(e); -} -context.setHiveTxnManager(driverContext.getTxnManager()); -context.setStatsSource(driverContext.getStatsSource()); -context.setHDFSCleanup(true); + // Reset the PerfLogger so that it doesn't retain any previous values. + // Any value from compilation phase can be obtained through the map set in queryDisplay during compilation. + PerfLogger perfLogger = SessionState.getPerfLogger(true); -driverTxnHandler.setContext(context); - } + // the reason that we set the txn manager for the cxt h
[jira] [Work logged] (HIVE-23814) Clean up Driver
[ https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457507&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457507 ] ASF GitHub Bot logged work on HIVE-23814: - Author: ASF GitHub Bot Created on: 11/Jul/20 04:19 Start Date: 11/Jul/20 04:19 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1222: URL: https://github.com/apache/hive/pull/1222#discussion_r453153111 ## File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java ## @@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo queryInfo, HiveTxnManager txnMana driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState); } - /** - * Compile a new query, but potentially reset taskID counter. Not resetting task counter - * is useful for generating re-entrant QL queries. - * @param command The HiveQL query to compile - * @param resetTaskIds Resets taskID counter if true. - * @return 0 for ok - */ - public int compile(String command, boolean resetTaskIds) { -try { - compile(command, resetTaskIds, false); - return 0; -} catch (CommandProcessorException cpr) { - return cpr.getErrorCode(); -} + @Override + public Context getContext() { +return context; } - // deferClose indicates if the close/destroy should be deferred when the process has been - // interrupted, it should be set to true if the compile is called within another method like - // runInternal, which defers the close to the called in that method. - @VisibleForTesting - public void compile(String command, boolean resetTaskIds, boolean deferClose) throws CommandProcessorException { -preparForCompile(resetTaskIds); - -Compiler compiler = new Compiler(context, driverContext, driverState); -QueryPlan plan = compiler.compile(command, deferClose); -driverContext.setPlan(plan); - -compileFinished(deferClose); + @Override + public HiveConf getConf() { +return driverContext.getConf(); } - private void compileFinished(boolean deferClose) { -if (DriverState.getDriverState().isAborted() && !deferClose) { - closeInProcess(true); -} + @Override + public CommandProcessorResponse run() throws CommandProcessorException { +return run(null, true); } - private void preparForCompile(boolean resetTaskIds) throws CommandProcessorException { -driverTxnHandler.createTxnManager(); -DriverState.setDriverState(driverState); -prepareContext(); -setQueryId(); + @Override + public CommandProcessorResponse run(String command) throws CommandProcessorException { +return run(command, false); + } -if (resetTaskIds) { - TaskFactory.resetId(); + private CommandProcessorResponse run(String command, boolean alreadyCompiled) throws CommandProcessorException { +try { + runInternal(command, alreadyCompiled); + return new CommandProcessorResponse(getSchema(), null); +} catch (CommandProcessorException cpe) { + processRunException(cpe); + throw cpe; } } - private void prepareContext() throws CommandProcessorException { -if (context != null && context.getExplainAnalyze() != AnalyzeState.RUNNING) { - // close the existing ctx etc before compiling a new query, but does not destroy driver - closeInProcess(false); -} + private void runInternal(String command, boolean alreadyCompiled) throws CommandProcessorException { +DriverState.setDriverState(driverState); +setInitialStateForRun(alreadyCompiled); +// a flag that helps to set the correct driver state in finally block by tracking if +// the method has been returned by an error or not. +boolean isFinishedWithError = true; try { - if (context == null) { -context = new Context(driverContext.getConf()); + HiveDriverRunHookContext hookContext = new HiveDriverRunHookContextImpl(driverContext.getConf(), + alreadyCompiled ? context.getCmd() : command); + runPreDriverHooks(hookContext); + + if (!alreadyCompiled) { +compileInternal(command, true); + } else { + driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime()); } -} catch (IOException e) { - throw new CommandProcessorException(e); -} -context.setHiveTxnManager(driverContext.getTxnManager()); -context.setStatsSource(driverContext.getStatsSource()); -context.setHDFSCleanup(true); + // Reset the PerfLogger so that it doesn't retain any previous values. + // Any value from compilation phase can be obtained through the map set in queryDisplay during compilation. Review comment: Shouldn't this be the first thing in the life cycle? Or minimally in this method? Previous code paths might already started to use the perf logger.
[jira] [Resolved] (HIVE-23825) Create a flag to turn off _orc_acid_version file creation
[ https://issues.apache.org/jira/browse/HIVE-23825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary resolved HIVE-23825. --- Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master. Thanks for the review [~klcopp]! > Create a flag to turn off _orc_acid_version file creation > - > > Key: HIVE-23825 > URL: https://issues.apache.org/jira/browse/HIVE-23825 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > We do not really use the version files, and creating them could be costly. > We would like to add the possibility to prevent the overhead, and do not > create them -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23825) Create a flag to turn off _orc_acid_version file creation
[ https://issues.apache.org/jira/browse/HIVE-23825?focusedWorklogId=457504&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457504 ] ASF GitHub Bot logged work on HIVE-23825: - Author: ASF GitHub Bot Created on: 11/Jul/20 04:13 Start Date: 11/Jul/20 04:13 Worklog Time Spent: 10m Work Description: pvary merged pull request #1236: URL: https://github.com/apache/hive/pull/1236 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457504) Time Spent: 20m (was: 10m) > Create a flag to turn off _orc_acid_version file creation > - > > Key: HIVE-23825 > URL: https://issues.apache.org/jira/browse/HIVE-23825 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > We do not really use the version files, and creating them could be costly. > We would like to add the possibility to prevent the overhead, and do not > create them -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23351) Ranger Replication Scheduling
[ https://issues.apache.org/jira/browse/HIVE-23351?focusedWorklogId=457455&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457455 ] ASF GitHub Bot logged work on HIVE-23351: - Author: ASF GitHub Bot Created on: 11/Jul/20 00:31 Start Date: 11/Jul/20 00:31 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1004: URL: https://github.com/apache/hive/pull/1004#issuecomment-656949812 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457455) Time Spent: 3h (was: 2h 50m) > Ranger Replication Scheduling > - > > Key: HIVE-23351 > URL: https://issues.apache.org/jira/browse/HIVE-23351 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23351.01.patch, HIVE-23351.02.patch, > HIVE-23351.03.patch, HIVE-23351.04.patch, HIVE-23351.05.patch, > HIVE-23351.06.patch, HIVE-23351.07.patch, HIVE-23351.08.patch, > HIVE-23351.09.patch, HIVE-23351.10.patch, HIVE-23351.10.patch, > HIVE-23351.11.patch, HIVE-23351.12.patch > > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23339) SBA does not check permissions for DB location specified in Create database query
[ https://issues.apache.org/jira/browse/HIVE-23339?focusedWorklogId=457456&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457456 ] ASF GitHub Bot logged work on HIVE-23339: - Author: ASF GitHub Bot Created on: 11/Jul/20 00:31 Start Date: 11/Jul/20 00:31 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1011: URL: https://github.com/apache/hive/pull/1011#issuecomment-656949806 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457456) Time Spent: 20m (was: 10m) > SBA does not check permissions for DB location specified in Create database > query > - > > Key: HIVE-23339 > URL: https://issues.apache.org/jira/browse/HIVE-23339 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.0 >Reporter: Riju Trivedi >Assignee: Shubham Chaurasia >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-23339.01.patch > > Time Spent: 20m > Remaining Estimate: 0h > > With doAs=true and StorageBasedAuthorization provider, create database with > specific location succeeds even if user doesn't have access to that path. > > {code:java} > hadoop fs -ls -d /tmp/cannot_write > drwx-- - hive hadoop 0 2020-04-01 22:53 /tmp/cannot_write > create a database under /tmp/cannot_write. We would expect it to fail, but is > actually created successfully with "hive" as the owner: > rtrivedi@bdp01:~> beeline -e "create database rtrivedi_1 location > '/tmp/cannot_write/rtrivedi_1'" > INFO : OK > No rows affected (0.116 seconds) > hive@hpchdd2e:~> hadoop fs -ls /tmp/cannot_write > Found 1 items > drwx-- - hive hadoop 0 2020-04-01 23:05 /tmp/cannot_write/rtrivedi_1 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22415) Upgrade to Java 11
[ https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=457312&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457312 ] ASF GitHub Bot logged work on HIVE-22415: - Author: ASF GitHub Bot Created on: 10/Jul/20 19:11 Start Date: 10/Jul/20 19:11 Worklog Time Spent: 10m Work Description: belugabehr commented on a change in pull request #1241: URL: https://github.com/apache/hive/pull/1241#discussion_r453030985 ## File path: standalone-metastore/metastore-server/src/main/resources/datanucleus-log4j.properties ## @@ -15,3 +15,5 @@ log4j.category.DataNucleus.ValueGeneration=DEBUG, A1 log4j.category.DataNucleus.Enhancer=INFO, A1 log4j.category.DataNucleus.SchemaTool=INFO, A1 + +log4j.category.DataNucleus.Persistence=INFO, A1 Review comment: Remove this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457312) Time Spent: 20m (was: 10m) > Upgrade to Java 11 > -- > > Key: HIVE-22415 > URL: https://issues.apache.org/jira/browse/HIVE-22415 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Critical > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Upgrade Hive to Java JDK 11 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22415) Upgrade to Java 11
[ https://issues.apache.org/jira/browse/HIVE-22415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-22415: -- Labels: pull-request-available (was: ) > Upgrade to Java 11 > -- > > Key: HIVE-22415 > URL: https://issues.apache.org/jira/browse/HIVE-22415 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Upgrade Hive to Java JDK 11 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22415) Upgrade to Java 11
[ https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=457311&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457311 ] ASF GitHub Bot logged work on HIVE-22415: - Author: ASF GitHub Bot Created on: 10/Jul/20 19:10 Start Date: 10/Jul/20 19:10 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1241: URL: https://github.com/apache/hive/pull/1241 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457311) Remaining Estimate: 0h Time Spent: 10m > Upgrade to Java 11 > -- > > Key: HIVE-22415 > URL: https://issues.apache.org/jira/browse/HIVE-22415 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > Upgrade Hive to Java JDK 11 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23700) HiveConf static initialization fails when JAR URI is opaque
[ https://issues.apache.org/jira/browse/HIVE-23700?focusedWorklogId=457273&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457273 ] ASF GitHub Bot logged work on HIVE-23700: - Author: ASF GitHub Bot Created on: 10/Jul/20 17:39 Start Date: 10/Jul/20 17:39 Worklog Time Spent: 10m Work Description: uptycs-anudeep opened a new pull request #1240: URL: https://github.com/apache/hive/pull/1240 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457273) Remaining Estimate: 119h 20m (was: 119.5h) Time Spent: 40m (was: 0.5h) > HiveConf static initialization fails when JAR URI is opaque > --- > > Key: HIVE-23700 > URL: https://issues.apache.org/jira/browse/HIVE-23700 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.7 >Reporter: Francisco Guerrero >Assignee: Francisco Guerrero >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-23700.1.patch > > Original Estimate: 120h > Time Spent: 40m > Remaining Estimate: 119h 20m > > HiveConf static initialization fails when the jar URI is opaque, for example > when it's embedded as a fat jar in a spring boot application. Then > initialization of the HiveConf static block fails and the HiveConf class does > not get classloaded. The opaque URI in my case looks like this > _jar:file:/usr/local/server/some-service-jar.jar!/BOOT-INF/lib/hive-common-2.3.7.jar!/_ > HiveConf#findConfigFile should be able to handle `IllegalArgumentException` > when the jar `URI` provided to `File` throws the exception. > To surface this issue three conditions need to be met. > 1. hive-site.xml should not be on the classpath > 2. hive-site.xml should not be on "HIVE_CONF_DIR" > 3. hive-site.xml should not be on "HIVE_HOME" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23753) Make LLAP Secretmanager token path configurable
[ https://issues.apache.org/jira/browse/HIVE-23753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155636#comment-17155636 ] Rajkumar Singh commented on HIVE-23753: --- https://github.com/apache/hive/pull/1171 > Make LLAP Secretmanager token path configurable > --- > > Key: HIVE-23753 > URL: https://issues.apache.org/jira/browse/HIVE-23753 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 4.0.0 >Reporter: Rajkumar Singh >Assignee: Rajkumar Singh >Priority: Major > > In a very Busy LLAP cluster if for some reason the Tokens under > zkdtsm_hive_llap0 zk path are not cleaned then LLAP Daemon startup takes a > very long time to startup, this may lead to service outage if LLAP daemons > are not started and the number of retries while checking LLAP app status > exceeds. upon looking the jstack of llap daemon it seems to traverse the > zkdtsm_hive_llap0 zk path before starting the secret manager. > {code:java} >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1386) > - locked <0x7fef36cdd338> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1153) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:302) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:291) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:288) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:279) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$2.forPath(GetDataBuilderImpl.java:142) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$2.forPath(GetDataBuilderImpl.java:138) > at > org.apache.curator.framework.recipes.cache.PathChildrenCache.internalRebuildNode(PathChildrenCache.java:591) > at > org.apache.curator.framework.recipes.cache.PathChildrenCache.rebuild(PathChildrenCache.java:331) > at > org.apache.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:300) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.startThreads(ZKDelegationTokenSecretManager.java:370) > at > org.apache.hadoop.hive.llap.security.SecretManager.startThreads(SecretManager.java:82) > at > org.apache.hadoop.hive.llap.security.SecretManager$1.run(SecretManager.java:223) > at > org.apache.hadoop.hive.llap.security.SecretManager$1.run(SecretManager.java:218) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1846) > at > org.apache.hadoop.hive.llap.security.SecretManager.createSecretManager(SecretManager.java:218) > at > org.apache.hadoop.hive.llap.security.SecretManager.createSecretManager(SecretManager.java:212) > at > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.(LlapDaemon.java:279) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22957) Support Partition Filtering In MSCK REPAIR TABLE Command
[ https://issues.apache.org/jira/browse/HIVE-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155621#comment-17155621 ] Syed Shameerur Rahman commented on HIVE-22957: -- Tests have passed now! > Support Partition Filtering In MSCK REPAIR TABLE Command > > > Key: HIVE-22957 > URL: https://issues.apache.org/jira/browse/HIVE-22957 > Project: Hive > Issue Type: Improvement >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf > > Time Spent: 4h 40m > Remaining Estimate: 0h > > *Design Doc:* > [^Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23809) Data loss occurs when using tez engine to join different bucketing_version tables
[ https://issues.apache.org/jira/browse/HIVE-23809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangQiDong updated HIVE-23809: --- Attachment: HIVE-23809.1.patch > Data loss occurs when using tez engine to join different bucketing_version > tables > - > > Key: HIVE-23809 > URL: https://issues.apache.org/jira/browse/HIVE-23809 > Project: Hive > Issue Type: Bug > Components: Hive, Tez >Affects Versions: 3.1.0 >Reporter: ZhangQiDong >Assignee: ZhangQiDong >Priority: Major > Labels: hive, tez > Attachments: HIVE-23809.1.patch > > Original Estimate: 12h > Remaining Estimate: 12h > > *Test case:* > create table table_a (a int, b string,c string); > create table table_b (a int, b string,c string); > insert into table_a values > (11,'a','aa'),(22,'b','bb'),(33,'c','cc'),(44,'d','dd'),(5,'e','ee'),(6,'f','ff'),(7,'g','gg'); > insert into table_b values > (11,'a','aa'),(22,'b','bb'),(33,'c','cc'),(44,'d','dd'),(5,'e','ee'),(6,'f','ff'),(7,'g','gg'); > alter table table_a set tblproperties ("bucketing_version"='1'); > alter table table_b set tblproperties ("bucketing_version"='2'); > *Hivesql:* > *set hive.auto.convert.join=false;* > *set mapred.reduce.tasks=2;* > select ta.a as a_a, tb.b as b_b from table_a ta join table_b tb > on(ta.a=tb.a); > set hive.execution.engine=mr; > +---+-+ > |a_a|b_b| > +---+-+ > |5|e| > |6|f| > |7|g| > |11|a| > |22|b| > |33|c| > |44|d| > +---+-+ > set hive.execution.engine=tez; > +---+-+ > |a_a|b_b| > +---+-+ > |6|f| > |5|e| > |11|a| > |33|c| > +---+-+ > > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23809) Data loss occurs when using tez engine to join different bucketing_version tables
[ https://issues.apache.org/jira/browse/HIVE-23809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangQiDong updated HIVE-23809: --- Hadoop Flags: Reviewed Tags: Tez Target Version/s: 3.1.0 Status: Patch Available (was: Open) The modification logic is the same as that of HIVE-22098 patch. However, the HIVE-22098 patch only works for hive on Mr. HIVE-23809 patch solves the problem of hive on Tez. If you want tez and MR to have the same result, you need to apply the HIVE-22098 patch and HIVE-23809 patch at the same time. > Data loss occurs when using tez engine to join different bucketing_version > tables > - > > Key: HIVE-23809 > URL: https://issues.apache.org/jira/browse/HIVE-23809 > Project: Hive > Issue Type: Bug > Components: Hive, Tez >Affects Versions: 3.1.0 >Reporter: ZhangQiDong >Assignee: ZhangQiDong >Priority: Major > Labels: hive, tez > Original Estimate: 12h > Remaining Estimate: 12h > > *Test case:* > create table table_a (a int, b string,c string); > create table table_b (a int, b string,c string); > insert into table_a values > (11,'a','aa'),(22,'b','bb'),(33,'c','cc'),(44,'d','dd'),(5,'e','ee'),(6,'f','ff'),(7,'g','gg'); > insert into table_b values > (11,'a','aa'),(22,'b','bb'),(33,'c','cc'),(44,'d','dd'),(5,'e','ee'),(6,'f','ff'),(7,'g','gg'); > alter table table_a set tblproperties ("bucketing_version"='1'); > alter table table_b set tblproperties ("bucketing_version"='2'); > *Hivesql:* > *set hive.auto.convert.join=false;* > *set mapred.reduce.tasks=2;* > select ta.a as a_a, tb.b as b_b from table_a ta join table_b tb > on(ta.a=tb.a); > set hive.execution.engine=mr; > +---+-+ > |a_a|b_b| > +---+-+ > |5|e| > |6|f| > |7|g| > |11|a| > |22|b| > |33|c| > |44|d| > +---+-+ > set hive.execution.engine=tez; > +---+-+ > |a_a|b_b| > +---+-+ > |6|f| > |5|e| > |11|a| > |33|c| > +---+-+ > > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23830) Remove shutdownhook after query is completed
[ https://issues.apache.org/jira/browse/HIVE-23830?focusedWorklogId=457211&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457211 ] ASF GitHub Bot logged work on HIVE-23830: - Author: ASF GitHub Bot Created on: 10/Jul/20 15:54 Start Date: 10/Jul/20 15:54 Worklog Time Spent: 10m Work Description: mustafaiman commented on a change in pull request #1235: URL: https://github.com/apache/hive/pull/1235#discussion_r452929806 ## File path: ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java ## @@ -553,11 +553,13 @@ private void release(boolean releaseLocks) { LOG.warn("Exception when releasing locking in destroy: " + e.getMessage()); } } -ShutdownHookManager.removeShutdownHook(shutdownRunner); +ShutdownHookManager.removeShutdownHook(txnRollbackRunner); } void releaseLocksAndCommitOrRollback(boolean commit) throws LockException { Review comment: I would not rename to `commitAndCleanup` because this method also rolls back transaction. I'll rename it to `endTransactionAndCleanup` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457211) Time Spent: 1h (was: 50m) > Remove shutdownhook after query is completed > > > Key: HIVE-23830 > URL: https://issues.apache.org/jira/browse/HIVE-23830 > Project: Hive > Issue Type: Bug >Reporter: Mustafa Iman >Assignee: Mustafa Iman >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Each query registers a shutdownHook to release transactional resources in > case JVM shuts down mid query. These hooks are not cleaned up until session > is closed. Session life time is unbounded. So these hooks are a memory leak. > They should be cleaned as soon as transaction is completed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23836) Make "cols" dependent so that it cascade deletes
[ https://issues.apache.org/jira/browse/HIVE-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HIVE-23836: -- Description: {quote} If you want the deletion of a persistent object to cause the deletion of related objects then you need to mark the related fields in the mapping to be "dependent". {quote} http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object The database won't do it: {code:sql|title=Derby Schema} ALTER TABLE "APP"."COLUMNS_V2" ADD CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "APP"."CDS" ("CD_ID") ON DELETE NO ACTION ON UPDATE NO ACTION; {code} https://github.com/apache/hive/blob/65cf6957cf9432277a096f91b40985237274579f/standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql#L452 was: {quote} If you want the deletion of a persistent object to cause the deletion of related objects then you need to mark the related fields in the mapping to be "dependent". {quote} http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object > Make "cols" dependent so that it cascade deletes > > > Key: HIVE-23836 > URL: https://issues.apache.org/jira/browse/HIVE-23836 > Project: Hive > Issue Type: Bug >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {quote} > If you want the deletion of a persistent object to cause the deletion of > related objects then you need to mark the related fields in the mapping to be > "dependent". > {quote} > http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields > http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object > The database won't do it: > {code:sql|title=Derby Schema} > ALTER TABLE "APP"."COLUMNS_V2" ADD CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY > ("CD_ID") REFERENCES "APP"."CDS" ("CD_ID") ON DELETE NO ACTION ON UPDATE NO > ACTION; > {code} > https://github.com/apache/hive/blob/65cf6957cf9432277a096f91b40985237274579f/standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql#L452 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23836) Make "cols" dependent so that it cascade deletes
[ https://issues.apache.org/jira/browse/HIVE-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23836: -- Labels: pull-request-available (was: ) > Make "cols" dependent so that it cascade deletes > > > Key: HIVE-23836 > URL: https://issues.apache.org/jira/browse/HIVE-23836 > Project: Hive > Issue Type: Bug >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {quote} > If you want the deletion of a persistent object to cause the deletion of > related objects then you need to mark the related fields in the mapping to be > "dependent". > {quote} > http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields > http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23836) Make "cols" dependent so that it cascade deletes
[ https://issues.apache.org/jira/browse/HIVE-23836?focusedWorklogId=457196&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457196 ] ASF GitHub Bot logged work on HIVE-23836: - Author: ASF GitHub Bot Created on: 10/Jul/20 15:19 Start Date: 10/Jul/20 15:19 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1239: URL: https://github.com/apache/hive/pull/1239 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457196) Remaining Estimate: 0h Time Spent: 10m > Make "cols" dependent so that it cascade deletes > > > Key: HIVE-23836 > URL: https://issues.apache.org/jira/browse/HIVE-23836 > Project: Hive > Issue Type: Bug >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > {quote} > If you want the deletion of a persistent object to cause the deletion of > related objects then you need to mark the related fields in the mapping to be > "dependent". > {quote} > http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields > http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23836) Make "cols" dependent so that it cascade deletes
[ https://issues.apache.org/jira/browse/HIVE-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HIVE-23836: -- Description: {quote} If you want the deletion of a persistent object to cause the deletion of related objects then you need to mark the related fields in the mapping to be "dependent". {quote} http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object > Make "cols" dependent so that it cascade deletes > > > Key: HIVE-23836 > URL: https://issues.apache.org/jira/browse/HIVE-23836 > Project: Hive > Issue Type: Bug >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > > {quote} > If you want the deletion of a persistent object to cause the deletion of > related objects then you need to mark the related fields in the mapping to be > "dependent". > {quote} > http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields > http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23836) Make "cols" dependent so that it cascade deletes
[ https://issues.apache.org/jira/browse/HIVE-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor reassigned HIVE-23836: - > Make "cols" dependent so that it cascade deletes > > > Key: HIVE-23836 > URL: https://issues.apache.org/jira/browse/HIVE-23836 > Project: Hive > Issue Type: Bug >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2
[ https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=457190&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457190 ] ASF GitHub Bot logged work on HIVE-23363: - Author: ASF GitHub Bot Created on: 10/Jul/20 15:03 Start Date: 10/Jul/20 15:03 Worklog Time Spent: 10m Work Description: belugabehr edited a comment on pull request #1118: URL: https://github.com/apache/hive/pull/1118#issuecomment-656722071 > Foreign Keys > > So we now have given the datastore control over the cascade deletion strategy for objects stored in these tables. Please be aware that JDO provides Dependent Fields as a way of allowing cascade deletion. The difference here is that Dependent Fields is controlled by DataNucleus, whereas foreign key delete actions are controlled by the datastore (assuming the datastore supports it even) ``` http://www.datanucleus.org/products/accessplatform/jdo/mapping.html#fk This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457190) Time Spent: 2.5h (was: 2h 20m) > Upgrade DataNucleus dependency to 5.2 > - > > Key: HIVE-23363 > URL: https://issues.apache.org/jira/browse/HIVE-23363 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Zoltan Chovan >Assignee: David Mollitor >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23363.2.patch, HIVE-23363.patch > > Time Spent: 2.5h > Remaining Estimate: 0h > > Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been > retired: > [http://www.datanucleus.org/documentation/products.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2
[ https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=457189&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457189 ] ASF GitHub Bot logged work on HIVE-23363: - Author: ASF GitHub Bot Created on: 10/Jul/20 15:02 Start Date: 10/Jul/20 15:02 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1118: URL: https://github.com/apache/hive/pull/1118#issuecomment-656722071 ``` Foreign Keys So we now have given the datastore control over the cascade deletion strategy for objects stored in these tables. Please be aware that JDO provides Dependent Fields as a way of allowing cascade deletion. The difference here is that Dependent Fields is controlled by DataNucleus, whereas foreign key delete actions are controlled by the datastore (assuming the datastore supports it even) ``` http://www.datanucleus.org/products/accessplatform/jdo/mapping.html#fk This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457189) Time Spent: 2h 20m (was: 2h 10m) > Upgrade DataNucleus dependency to 5.2 > - > > Key: HIVE-23363 > URL: https://issues.apache.org/jira/browse/HIVE-23363 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Zoltan Chovan >Assignee: David Mollitor >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23363.2.patch, HIVE-23363.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been > retired: > [http://www.datanucleus.org/documentation/products.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23509) MapJoin AssertionError: Capacity must be power of 2
[ https://issues.apache.org/jira/browse/HIVE-23509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155534#comment-17155534 ] Shashank Pedamallu commented on HIVE-23509: --- Thank you very much for getting this through! > MapJoin AssertionError: Capacity must be power of 2 > --- > > Key: HIVE-23509 > URL: https://issues.apache.org/jira/browse/HIVE-23509 > Project: Hive > Issue Type: Bug > Environment: Hive-2.3.6 >Reporter: Shashank Pedamallu >Assignee: Shashank Pedamallu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Observed AssertionError errors in Hive query when rowCount for join is issued > as (2^x)+(2^(x+1)). > Following is the stacktrace: > {noformat} > [2020-05-11 05:43:12,135] {base_task_runner.py:95} INFO - Subtask: ERROR : > Vertex failed, vertexName=Map 4, vertexId=vertex_1588729523139_51702_1_06, > diagnostics=[Task failed, taskId=task_1588729523139_51702_1_06_001286, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1588729523139_51702_1_06_001286_0:java.lang.RuntimeException: > java.lang.AssertionError: Capacity must be a power of two [2020-05-11 > 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > java.security.AccessController.doPrivileged(Native Method) [2020-05-11 > 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > javax.security.auth.Subject.doAs(Subject.java:422) [2020-05-11 05:43:12,136] > {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > java.util.concurrent.FutureTask.run(FutureTask.java:266) [2020-05-11 > 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at > java.lang.Thread.run(Thread.java:748) [2020-05-11 05:43:12,137] > {base_task_runner.py:95} INFO - Subtask: Caused by: java.lang.AssertionError: > Capacity must be a power of two [2020-05-11 05:43:12,137] > {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.validateCapacity(BytesBytesMultiHashMap.java:552) > [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashImpl(BytesBytesMultiHashMap.java:731) > [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashToTarget(BytesBytesMultiHashMap.java:545) > [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$HashPartition.getHashMapFromDisk(HybridHashTableContainer.java:183) > [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.reloadHashTable(MapJoinOperator.java:641) > [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.continueProcess(MapJoinOperator.j
[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2
[ https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=457186&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457186 ] ASF GitHub Bot logged work on HIVE-23363: - Author: ASF GitHub Bot Created on: 10/Jul/20 14:54 Start Date: 10/Jul/20 14:54 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1118: URL: https://github.com/apache/hive/pull/1118#issuecomment-656718317 ``` ALTER TABLE "APP"."COLUMNS_V2" ADD CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "APP"."CDS" ("CD_ID") ON DELETE NO ACTION ON UPDATE NO ACTION; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457186) Time Spent: 2h 10m (was: 2h) > Upgrade DataNucleus dependency to 5.2 > - > > Key: HIVE-23363 > URL: https://issues.apache.org/jira/browse/HIVE-23363 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Zoltan Chovan >Assignee: David Mollitor >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23363.2.patch, HIVE-23363.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been > retired: > [http://www.datanucleus.org/documentation/products.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22412) StatsUtils throw NPE when explain
[ https://issues.apache.org/jira/browse/HIVE-22412?focusedWorklogId=457183&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457183 ] ASF GitHub Bot logged work on HIVE-22412: - Author: ASF GitHub Bot Created on: 10/Jul/20 14:45 Start Date: 10/Jul/20 14:45 Worklog Time Spent: 10m Work Description: StefanXiepj commented on a change in pull request #1209: URL: https://github.com/apache/hive/pull/1209#discussion_r452889505 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java ## @@ -1336,6 +1341,9 @@ public static long getSizeOfPrimitiveTypeArraysFromType(String colType, int leng */ public static long getSizeOfMap(StandardConstantMapObjectInspector scmoi) { Map map = scmoi.getWritableConstantValue(); +if (null == map || map.isEmpty()) { + return 0L; +} Review comment: @belugabehr & @kgyrtkirk , I agree entirely with you! It have been updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457183) Time Spent: 3h (was: 2h 50m) > StatsUtils throw NPE when explain > - > > Key: HIVE-22412 > URL: https://issues.apache.org/jira/browse/HIVE-22412 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.2.1, 2.0.0, 3.0.0 >Reporter: xiepengjie >Assignee: xiepengjie >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22412.patch > > Time Spent: 3h > Remaining Estimate: 0h > > The demo like this: > {code:java} > drop table if exists explain_npe_map; > drop table if exists explain_npe_array; > drop table if exists explain_npe_struct; > create table explain_npe_map( c1 map ); > create table explain_npe_array ( c1 array ); > create table explain_npe_struct ( c1 struct ); > -- error > set hive.cbo.enable=false; > explain select c1 from explain_npe_map where c1 is null; > explain select c1 from explain_npe_array where c1 is null; > explain select c1 from explain_npe_struct where c1 is null; > -- correct > set hive.cbo.enable=true; > explain select c1 from explain_npe_map where c1 is null; > explain select c1 from explain_npe_array where c1 is null; > explain select c1 from explain_npe_struct where c1 is null;{code} > > if the conf 'hive.cbo.enable' set false , NPE will be thrown ; otherwise will > not. > {code:java} > hive> drop table if exists explain_npe_map; > OK > Time taken: 0.063 seconds > hive> drop table if exists explain_npe_array; > OK > Time taken: 0.035 seconds > hive> drop table if exists explain_npe_struct; > OK > Time taken: 0.015 seconds > hive> > > create table explain_npe_map( c1 map ); > OK > Time taken: 0.584 seconds > hive> create table explain_npe_array ( c1 array ); > OK > Time taken: 0.216 seconds > hive> create table explain_npe_struct ( c1 struct ); > OK > Time taken: 0.17 seconds > hive> > > set hive.cbo.enable=false; > hive> explain select c1 from explain_npe_map where c1 is null; > FAILED: NullPointerException null > hive> explain select c1 from explain_npe_array where c1 is null; > FAILED: NullPointerException null > hive> explain select c1 from explain_npe_struct where c1 is null; > FAILED: RuntimeException Error invoking signature method > hive> > > set hive.cbo.enable=true; > hive> explain select c1 from explain_npe_map where c1 is null; > OK > STAGE DEPENDENCIES: > Stage-0 is a root stageSTAGE PLANS: > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > TableScan > alias: explain_npe_map > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column > stats: NONE > Filter Operator > predicate: false (type: boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column > stats: NONE > Select Operator > expressions: c1 (type: map) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > ListSinkTime taken: 1.593 seconds, Fetched: 20 row(s) > hive> explain select c1 from explain_npe_array where c1 is null; > OK > STAGE DEPENDENCIES: > Stage-0 is a root stageSTAGE PLANS: > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > TableScan > alias: explain_npe_array > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column > stats:
[jira] [Work logged] (HIVE-23793) Review of QueryInfo Class
[ https://issues.apache.org/jira/browse/HIVE-23793?focusedWorklogId=457180&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457180 ] ASF GitHub Bot logged work on HIVE-23793: - Author: ASF GitHub Bot Created on: 10/Jul/20 14:28 Start Date: 10/Jul/20 14:28 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1197: URL: https://github.com/apache/hive/pull/1197 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457180) Time Spent: 1h 10m (was: 1h) > Review of QueryInfo Class > - > > Key: HIVE-23793 > URL: https://issues.apache.org/jira/browse/HIVE-23793 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23793) Review of QueryInfo Class
[ https://issues.apache.org/jira/browse/HIVE-23793?focusedWorklogId=457179&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457179 ] ASF GitHub Bot logged work on HIVE-23793: - Author: ASF GitHub Bot Created on: 10/Jul/20 14:27 Start Date: 10/Jul/20 14:27 Worklog Time Spent: 10m Work Description: belugabehr closed pull request #1197: URL: https://github.com/apache/hive/pull/1197 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457179) Time Spent: 1h (was: 50m) > Review of QueryInfo Class > - > > Key: HIVE-23793 > URL: https://issues.apache.org/jira/browse/HIVE-23793 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23638) Fix FindBug issues in hive-common
[ https://issues.apache.org/jira/browse/HIVE-23638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-23638. - Fix Version/s: 4.0.0 Resolution: Fixed pushed to master. Thank you [~pgaref]! > Fix FindBug issues in hive-common > - > > Key: HIVE-23638 > URL: https://issues.apache.org/jira/browse/HIVE-23638 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: spotbugsXml.xml > > Time Spent: 2h 40m > Remaining Estimate: 0h > > mvn -Pspotbugs > -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO > -pl :hive-common test-compile > com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23638) Fix FindBug issues in hive-common
[ https://issues.apache.org/jira/browse/HIVE-23638?focusedWorklogId=457160&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457160 ] ASF GitHub Bot logged work on HIVE-23638: - Author: ASF GitHub Bot Created on: 10/Jul/20 13:13 Start Date: 10/Jul/20 13:13 Worklog Time Spent: 10m Work Description: kgyrtkirk merged pull request #1161: URL: https://github.com/apache/hive/pull/1161 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457160) Time Spent: 2h 40m (was: 2.5h) > Fix FindBug issues in hive-common > - > > Key: HIVE-23638 > URL: https://issues.apache.org/jira/browse/HIVE-23638 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Attachments: spotbugsXml.xml > > Time Spent: 2h 40m > Remaining Estimate: 0h > > mvn -Pspotbugs > -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO > -pl :hive-common test-compile > com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23824) LLAP - add API to look up ORC metadata for certain Path
[ https://issues.apache.org/jira/browse/HIVE-23824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ádám Szita updated HIVE-23824: -- Status: Patch Available (was: Open) > LLAP - add API to look up ORC metadata for certain Path > --- > > Key: HIVE-23824 > URL: https://issues.apache.org/jira/browse/HIVE-23824 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > LLAP IO supports caching but currently this is only done via LlapRecordReader > / using splits, aka good old mapreduce way. > At certain times it would worth to leverage the caching of files on certain > paths, that are not necessarily associated with a record reader directly. An > example of this could be the caching of ACID delete delta files, as they are > currently being read without caching. > With this patch we'd extend the LLAP API and offer another entry point for > retrieving metadata of ORC files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23824) LLAP - add API to look up ORC metadata for certain Path
[ https://issues.apache.org/jira/browse/HIVE-23824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23824: -- Labels: pull-request-available (was: ) > LLAP - add API to look up ORC metadata for certain Path > --- > > Key: HIVE-23824 > URL: https://issues.apache.org/jira/browse/HIVE-23824 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > LLAP IO supports caching but currently this is only done via LlapRecordReader > / using splits, aka good old mapreduce way. > At certain times it would worth to leverage the caching of files on certain > paths, that are not necessarily associated with a record reader directly. An > example of this could be the caching of ACID delete delta files, as they are > currently being read without caching. > With this patch we'd extend the LLAP API and offer another entry point for > retrieving metadata of ORC files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23824) LLAP - add API to look up ORC metadata for certain Path
[ https://issues.apache.org/jira/browse/HIVE-23824?focusedWorklogId=457155&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457155 ] ASF GitHub Bot logged work on HIVE-23824: - Author: ASF GitHub Bot Created on: 10/Jul/20 13:06 Start Date: 10/Jul/20 13:06 Worklog Time Spent: 10m Work Description: szlta opened a new pull request #1238: URL: https://github.com/apache/hive/pull/1238 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457155) Remaining Estimate: 0h Time Spent: 10m > LLAP - add API to look up ORC metadata for certain Path > --- > > Key: HIVE-23824 > URL: https://issues.apache.org/jira/browse/HIVE-23824 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > LLAP IO supports caching but currently this is only done via LlapRecordReader > / using splits, aka good old mapreduce way. > At certain times it would worth to leverage the caching of files on certain > paths, that are not necessarily associated with a record reader directly. An > example of this could be the caching of ACID delete delta files, as they are > currently being read without caching. > With this patch we'd extend the LLAP API and offer another entry point for > retrieving metadata of ORC files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23824) LLAP - add API to look up ORC metadata for certain Path
[ https://issues.apache.org/jira/browse/HIVE-23824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ádám Szita updated HIVE-23824: -- Description: LLAP IO supports caching but currently this is only done via LlapRecordReader / using splits, aka good old mapreduce way. At certain times it would worth to leverage the caching of files on certain paths, that are not necessarily associated with a record reader directly. An example of this could be the caching of ACID delete delta files, as they are currently being read without caching. With this patch we'd extend the LLAP API and offer another entry point for retrieving metadata of ORC files. > LLAP - add API to look up ORC metadata for certain Path > --- > > Key: HIVE-23824 > URL: https://issues.apache.org/jira/browse/HIVE-23824 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > > LLAP IO supports caching but currently this is only done via LlapRecordReader > / using splits, aka good old mapreduce way. > At certain times it would worth to leverage the caching of files on certain > paths, that are not necessarily associated with a record reader directly. An > example of this could be the caching of ACID delete delta files, as they are > currently being read without caching. > With this patch we'd extend the LLAP API and offer another entry point for > retrieving metadata of ORC files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457139&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457139 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:41 Start Date: 10/Jul/20 12:41 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r452816736 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java ## @@ -330,6 +333,20 @@ a database ( directory ) return 0; } + private void addLazyDataCopyTask(TaskTracker loadTaskTracker) { Review comment: This is only for external tables. This will be before metadata copy as we are doing currently for external tables. ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java ## @@ -148,6 +148,13 @@ public static synchronized ReplChangeManager getInstance(Configuration conf) return instance; } + public static synchronized ReplChangeManager getInstance() { Review comment: Needed utility method of ReplChangeManager which earlier used to be static. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457139) Time Spent: 2h 40m (was: 2.5h) > Memory efficient iterator should be used during replication. > > > Key: HIVE-23069 > URL: https://issues.apache.org/jira/browse/HIVE-23069 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23069.01.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Currently the iterator used while copying table data is memory based. In case > of a database with very large number of table/partitions, such iterator may > cause HS2 process to go OOM. > Also introduces a config option to run data copy tasks during repl load > operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22957) Support Partition Filtering In MSCK REPAIR TABLE Command
[ https://issues.apache.org/jira/browse/HIVE-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155448#comment-17155448 ] Syed Shameerur Rahman commented on HIVE-22957: -- [~kgyrtkirk] Thank you for the review. I have tried to address all your comments and updated the PR. Please take a look? FYI: The test failures are unrelated and passed on local run. > Support Partition Filtering In MSCK REPAIR TABLE Command > > > Key: HIVE-22957 > URL: https://issues.apache.org/jira/browse/HIVE-22957 > Project: Hive > Issue Type: Improvement >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf > > Time Spent: 4h 40m > Remaining Estimate: 0h > > *Design Doc:* > [^Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457137&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457137 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:38 Start Date: 10/Jul/20 12:38 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r452816170 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileListStreamer.java ## @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.repl.util; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.BufferedWriter; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStreamWriter; +import java.util.concurrent.LinkedBlockingQueue; +import java.util.concurrent.TimeUnit; + +public class FileListStreamer extends Thread implements Closeable { + private static final Logger LOG = LoggerFactory.getLogger(FileListStreamer.class); + private static final long TIMEOUT_IN_SECS = 5L; + private volatile boolean stop; + private final LinkedBlockingQueue cache; + private Path backingFile; + private Configuration conf; + private BufferedWriter backingFileWriter; + private volatile boolean valid = true; + private volatile boolean asyncMode = false; + private final Object COMPLETION_LOCK = new Object(); + private volatile boolean completed = false; + + + + public FileListStreamer(LinkedBlockingQueue cache, Path backingFile, Configuration conf) throws IOException { +this.cache = cache; +this.backingFile = backingFile; +this.conf = conf; +init(); + } + + private void init() throws IOException { +FileSystem fs = FileSystem.get(backingFile.toUri(), conf); +backingFileWriter = new BufferedWriter(new OutputStreamWriter(fs.create(backingFile, !asyncMode))); Review comment: I will get rid of the synchronous mode altogether as currently not needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457137) Time Spent: 2.5h (was: 2h 20m) > Memory efficient iterator should be used during replication. > > > Key: HIVE-23069 > URL: https://issues.apache.org/jira/browse/HIVE-23069 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23069.01.patch > > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently the iterator used while copying table data is memory based. In case > of a database with very large number of table/partitions, such iterator may > cause HS2 process to go OOM. > Also introduces a config option to run data copy tasks during repl load > operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore
[ https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457138&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457138 ] ASF GitHub Bot logged work on HIVE-22015: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:38 Start Date: 10/Jul/20 12:38 Worklog Time Spent: 10m Work Description: adesh-rao commented on a change in pull request #1109: URL: https://github.com/apache/hive/pull/1109#discussion_r452816256 ## File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java ## @@ -1754,6 +1760,16 @@ public void testForeignKeys() { Assert.assertEquals(cachedKeys.get(0).getFkcolumn_name(), "col2"); Assert.assertEquals(cachedKeys.get(0).getCatName(), DEFAULT_CATALOG_NAME); +cachedKeys = sharedCache.listCachedForeignKeys( +DEFAULT_CATALOG_NAME, tbl.getDbName(), tbl.getTableName(), tbl1.getDbName(), tbl1.getTableName()); + +Assert.assertEquals(cachedKeys.size(), 1); +Assert.assertEquals(cachedKeys.get(0).getFk_name(), "fk2"); +Assert.assertEquals(cachedKeys.get(0).getFktable_db(), "db"); +Assert.assertEquals(cachedKeys.get(0).getFktable_name(), tbl.getTableName()); +Assert.assertEquals(cachedKeys.get(0).getFkcolumn_name(), "col1"); Review comment: done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457138) Time Spent: 4.5h (was: 4h 20m) > [CachedStore] Cache table constraints in CachedStore > > > Key: HIVE-22015 > URL: https://issues.apache.org/jira/browse/HIVE-22015 > Project: Hive > Issue Type: Sub-task >Reporter: Daniel Dai >Assignee: Adesh Kumar Rao >Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > Currently table constraints are not cached. Hive will pull all constraints > from tables involved in query, which results multiple db reads (including > get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort > to cache this is small as it's just another table component. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23618) NotificationLog should also contain events for default/check constraints
[ https://issues.apache.org/jira/browse/HIVE-23618?focusedWorklogId=457135&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457135 ] ASF GitHub Bot logged work on HIVE-23618: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:37 Start Date: 10/Jul/20 12:37 Worklog Time Spent: 10m Work Description: adesh-rao commented on pull request #1237: URL: https://github.com/apache/hive/pull/1237#issuecomment-656653951 @maheshk114 @pkumarsinha Can you please take a look at the PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457135) Remaining Estimate: 0h Time Spent: 10m > NotificationLog should also contain events for default/check constraints > > > Key: HIVE-23618 > URL: https://issues.apache.org/jira/browse/HIVE-23618 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: 4.0.0 >Reporter: Adesh Kumar Rao >Assignee: Adesh Kumar Rao >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > This should follow similar approach of notNull/Unique constraints. This will > also include event replication for these constraints. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore
[ https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457136&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457136 ] ASF GitHub Bot logged work on HIVE-22015: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:37 Start Date: 10/Jul/20 12:37 Worklog Time Spent: 10m Work Description: sankarh commented on a change in pull request #1109: URL: https://github.com/apache/hive/pull/1109#discussion_r452816142 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java ## @@ -261,44 +283,57 @@ public int getObjectSize(Class clazz, Object obj) { private Map parameters; private byte[] sdHash; private int otherSize; -private int tableColStatsCacheSize; -private int partitionCacheSize; -private int partitionColStatsCacheSize; -private int aggrColStatsCacheSize; + +// Arrays to hold the size/updated bit of cached objects. +// These arrays are to be referenced using MemberName enum only. +private int[] memberObjectsSize = new int[MemberName.values().length]; +private AtomicBoolean[] memberCacheUpdated = new AtomicBoolean[MemberName.values().length]; private ReentrantReadWriteLock tableLock = new ReentrantReadWriteLock(true); // For caching column stats for an unpartitioned table // Key is column name and the value is the col stat object private Map tableColStatsCache = new ConcurrentHashMap(); -private AtomicBoolean isTableColStatsCacheDirty = new AtomicBoolean(false); // For caching partition objects // Ket is partition values and the value is a wrapper around the partition object private Map partitionCache = new ConcurrentHashMap(); -private AtomicBoolean isPartitionCacheDirty = new AtomicBoolean(false); // For caching column stats for a partitioned table // Key is aggregate of partition values, column name and the value is the col stat object private Map partitionColStatsCache = new ConcurrentHashMap(); -private AtomicBoolean isPartitionColStatsCacheDirty = new AtomicBoolean(false); // For caching aggregate column stats for all and all minus default partition // Key is column name and the value is a list of 2 col stat objects // (all partitions and all but default) private Map> aggrColStatsCache = new ConcurrentHashMap>(); -private AtomicBoolean isAggrPartitionColStatsCacheDirty = new AtomicBoolean(false); + +private Map primaryKeyCache = new ConcurrentHashMap<>(); + +private Map foreignKeyCache = new ConcurrentHashMap<>(); + +private Map notNullConstraintCache = new ConcurrentHashMap<>(); + +private Map uniqueConstraintCache = new ConcurrentHashMap<>(); TableWrapper(Table t, byte[] sdHash, String location, Map parameters) { this.t = t; this.sdHash = sdHash; this.location = location; this.parameters = parameters; - this.tableColStatsCacheSize = 0; - this.partitionCacheSize = 0; - this.partitionColStatsCacheSize = 0; - this.aggrColStatsCacheSize = 0; + for(MemberName mn : MemberName.values()) { +this.memberObjectsSize[mn.getValue()] = 0; Review comment: In second thought, I think ordinal is better as we freshly load cache entries during HMS startup. So, the ordering doesn't matter. However, setting values can be a problem if someone pass incorrect value or remove an element without updating other values. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457136) Time Spent: 4h 20m (was: 4h 10m) > [CachedStore] Cache table constraints in CachedStore > > > Key: HIVE-22015 > URL: https://issues.apache.org/jira/browse/HIVE-22015 > Project: Hive > Issue Type: Sub-task >Reporter: Daniel Dai >Assignee: Adesh Kumar Rao >Priority: Major > Labels: pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > > Currently table constraints are not cached. Hive will pull all constraints > from tables involved in query, which results multiple db reads (including > get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort > to cache this is small as it's just another table component. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457134&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457134 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:37 Start Date: 10/Jul/20 12:37 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r452815690 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileListStreamer.java ## @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.repl.util; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.BufferedWriter; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStreamWriter; +import java.util.concurrent.LinkedBlockingQueue; +import java.util.concurrent.TimeUnit; + +public class FileListStreamer extends Thread implements Closeable { Review comment: FileListStreamer is a treated as a specialized Worker and hence extending the Thread. If it would have been treated as a job then Runnable route might have been fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457134) Time Spent: 2h 20m (was: 2h 10m) > Memory efficient iterator should be used during replication. > > > Key: HIVE-23069 > URL: https://issues.apache.org/jira/browse/HIVE-23069 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23069.01.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently the iterator used while copying table data is memory based. In case > of a database with very large number of table/partitions, such iterator may > cause HS2 process to go OOM. > Also introduces a config option to run data copy tasks during repl load > operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23618) NotificationLog should also contain events for default/check constraints
[ https://issues.apache.org/jira/browse/HIVE-23618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23618: -- Labels: pull-request-available (was: ) > NotificationLog should also contain events for default/check constraints > > > Key: HIVE-23618 > URL: https://issues.apache.org/jira/browse/HIVE-23618 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: 4.0.0 >Reporter: Adesh Kumar Rao >Assignee: Adesh Kumar Rao >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This should follow similar approach of notNull/Unique constraints. This will > also include event replication for these constraints. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457132&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457132 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:35 Start Date: 10/Jul/20 12:35 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r452815017 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileListStreamer.java ## @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.repl.util; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.BufferedWriter; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStreamWriter; +import java.util.concurrent.LinkedBlockingQueue; +import java.util.concurrent.TimeUnit; + +public class FileListStreamer extends Thread implements Closeable { + private static final Logger LOG = LoggerFactory.getLogger(FileListStreamer.class); + private static final long TIMEOUT_IN_SECS = 5L; + private volatile boolean stop; + private final LinkedBlockingQueue cache; + private Path backingFile; + private Configuration conf; + private BufferedWriter backingFileWriter; + private volatile boolean valid = true; + private volatile boolean asyncMode = false; + private final Object COMPLETION_LOCK = new Object(); + private volatile boolean completed = false; + + + + public FileListStreamer(LinkedBlockingQueue cache, Path backingFile, Configuration conf) throws IOException { +this.cache = cache; +this.backingFile = backingFile; +this.conf = conf; +init(); + } + + private void init() throws IOException { +FileSystem fs = FileSystem.get(backingFile.toUri(), conf); +backingFileWriter = new BufferedWriter(new OutputStreamWriter(fs.create(backingFile, !asyncMode))); +LOG.info("Initialized a file based store to save a list at: {}, ayncMode:{}", backingFile, asyncMode); + } + + public boolean isValid() { +return valid; + } + + @Override + public void close() throws IOException { +if (!asyncMode) { + closeBackingFile(); + return; +} +stop = true; +synchronized (COMPLETION_LOCK) { + while (!completed && isValid()) { +try { + COMPLETION_LOCK.wait(TimeUnit.SECONDS.toMillis(TIMEOUT_IN_SECS)); +} catch (InterruptedException e) { +} + } +} +if (!isValid()) { Review comment: No, it can't be moved above as this ensures the correctness of the consumption of the remaining entries from the cache. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457132) Time Spent: 2h 10m (was: 2h) > Memory efficient iterator should be used during replication. > > > Key: HIVE-23069 > URL: https://issues.apache.org/jira/browse/HIVE-23069 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23069.01.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently the iterator used while copying table data is memory based. In case > of a database with very large number of table/partitions, such iterator may > cause HS2 process to go OOM. > Also introduces a config option to run data copy tasks du
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457129&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457129 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:32 Start Date: 10/Jul/20 12:32 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r452813518 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileListStreamer.java ## @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.repl.util; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.BufferedWriter; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStreamWriter; +import java.util.concurrent.LinkedBlockingQueue; +import java.util.concurrent.TimeUnit; + +public class FileListStreamer extends Thread implements Closeable { + private static final Logger LOG = LoggerFactory.getLogger(FileListStreamer.class); + private static final long TIMEOUT_IN_SECS = 5L; + private volatile boolean stop; + private final LinkedBlockingQueue cache; + private Path backingFile; + private Configuration conf; + private BufferedWriter backingFileWriter; + private volatile boolean valid = true; + private volatile boolean asyncMode = false; + private final Object COMPLETION_LOCK = new Object(); + private volatile boolean completed = false; + + + + public FileListStreamer(LinkedBlockingQueue cache, Path backingFile, Configuration conf) throws IOException { +this.cache = cache; +this.backingFile = backingFile; +this.conf = conf; +init(); + } + + private void init() throws IOException { +FileSystem fs = FileSystem.get(backingFile.toUri(), conf); +backingFileWriter = new BufferedWriter(new OutputStreamWriter(fs.create(backingFile, !asyncMode))); +LOG.info("Initialized a file based store to save a list at: {}, ayncMode:{}", backingFile, asyncMode); + } + + public boolean isValid() { +return valid; + } + + @Override + public void close() throws IOException { +if (!asyncMode) { + closeBackingFile(); + return; +} +stop = true; +synchronized (COMPLETION_LOCK) { + while (!completed && isValid()) { +try { + COMPLETION_LOCK.wait(TimeUnit.SECONDS.toMillis(TIMEOUT_IN_SECS)); +} catch (InterruptedException e) { +} + } +} +if (!isValid()) { + throw new IOException("File list is not in a valid state:" + backingFile); +} +LOG.info("Completed close for File List backed by ", backingFile); + } + + public synchronized void writeInThread(String nextEntry) throws SemanticException { +try { + backingFileWriter.write(nextEntry); + backingFileWriter.newLine(); +} catch (IOException e) { + throw new SemanticException(e); +} + } + @Override + public void run() { +asyncMode = true; +boolean exThrown = false; +while (!exThrown && (!stop || !cache.isEmpty())) { + try { +String nextEntry = cache.poll(TIMEOUT_IN_SECS, TimeUnit.SECONDS); +if (nextEntry != null) { + backingFileWriter.write(nextEntry); + backingFileWriter.newLine(); + LOG.debug("Writing entry {} to file list backed by {}", nextEntry, backingFile); +} + } catch (Exception iEx) { +if (!(iEx instanceof InterruptedException)) { + // not draining any more. Inform the producer to avoid OOM. + valid = false; + LOG.error("Exception while saving the list to file " + backingFile, iEx); + exThrown = true; +} + } +} +try{ + closeBackingFile(); + completed = true; +} finally { + synchronized (COMPLETION_LOCK) { +COMPLETION_LOCK.notify(); +
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457128&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457128 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:31 Start Date: 10/Jul/20 12:31 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r452812877 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileList.java ## @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.repl.util; + +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.BufferedReader; +import java.io.Closeable; +import java.io.IOException; +import java.io.InputStreamReader; +import java.util.Iterator; +import java.util.NoSuchElementException; +import java.util.concurrent.LinkedBlockingQueue; + + +/** + * A file backed list of Strings which is in-memory till the threshold. + */ +public class FileList implements Closeable, Iterator { Review comment: Also add concurrency tests - Can you please suggest on this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457128) Time Spent: 1h 50m (was: 1h 40m) > Memory efficient iterator should be used during replication. > > > Key: HIVE-23069 > URL: https://issues.apache.org/jira/browse/HIVE-23069 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23069.01.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently the iterator used while copying table data is memory based. In case > of a database with very large number of table/partitions, such iterator may > cause HS2 process to go OOM. > Also introduces a config option to run data copy tasks during repl load > operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457127&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457127 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:30 Start Date: 10/Jul/20 12:30 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r452812525 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileList.java ## @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.repl.util; + +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.BufferedReader; +import java.io.Closeable; +import java.io.IOException; +import java.io.InputStreamReader; +import java.util.Iterator; +import java.util.NoSuchElementException; +import java.util.concurrent.LinkedBlockingQueue; + + +/** + * A file backed list of Strings which is in-memory till the threshold. + */ +public class FileList implements Closeable, Iterator { + private static final Logger LOG = LoggerFactory.getLogger(FileList.class); + private static int fileListStreamerID = 0; + private static final String FILE_LIST_STREAMER_PREFIX = "file-list-streamer-"; + + private LinkedBlockingQueue cache; + private volatile boolean thresholdHit = false; + private int thresholdPoint; + private float thresholdFactor = 0.9f; + private Path backingFile; + private FileListStreamer fileListStreamer; + private FileListOpMode fileListOpMode; + private String nextElement; + private boolean noMoreElement; + private HiveConf conf; + private BufferedReader backingFileReader; + private volatile boolean asyncMode; + + + /** + * To be used only for READ mode; + */ + public FileList(Path backingFile, HiveConf conf) { Review comment: It would be risky to operate on same file both READ and WRITE at the same time and hence modes are there to prevent that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457127) Time Spent: 1h 40m (was: 1.5h) > Memory efficient iterator should be used during replication. > > > Key: HIVE-23069 > URL: https://issues.apache.org/jira/browse/HIVE-23069 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23069.01.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently the iterator used while copying table data is memory based. In case > of a database with very large number of table/partitions, such iterator may > cause HS2 process to go OOM. > Also introduces a config option to run data copy tasks during repl load > operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457126&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457126 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:29 Start Date: 10/Jul/20 12:29 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r452812001 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileList.java ## @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.repl.util; + +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.BufferedReader; +import java.io.Closeable; +import java.io.IOException; +import java.io.InputStreamReader; +import java.util.Iterator; +import java.util.NoSuchElementException; +import java.util.concurrent.LinkedBlockingQueue; + + +/** + * A file backed list of Strings which is in-memory till the threshold. + */ +public class FileList implements Closeable, Iterator { + private static final Logger LOG = LoggerFactory.getLogger(FileList.class); + private static int fileListStreamerID = 0; + private static final String FILE_LIST_STREAMER_PREFIX = "file-list-streamer-"; + + private LinkedBlockingQueue cache; + private volatile boolean thresholdHit = false; + private int thresholdPoint; + private float thresholdFactor = 0.9f; + private Path backingFile; + private FileListStreamer fileListStreamer; + private FileListOpMode fileListOpMode; + private String nextElement; + private boolean noMoreElement; + private HiveConf conf; + private BufferedReader backingFileReader; + private volatile boolean asyncMode; + + + /** + * To be used only for READ mode; + */ + public FileList(Path backingFile, HiveConf conf) { +this.backingFile = backingFile; +thresholdHit = true; +fileListOpMode = FileListOpMode.READ; +this.conf = conf; + } + + /** + * To be used only for WRITE mode; + */ + public FileList(Path backingFile, int cacheSize, HiveConf conf, boolean asyncMode) throws IOException { Review comment: If it is called other wise, it won't allow to use it anyway. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457126) Time Spent: 1.5h (was: 1h 20m) > Memory efficient iterator should be used during replication. > > > Key: HIVE-23069 > URL: https://issues.apache.org/jira/browse/HIVE-23069 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23069.01.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently the iterator used while copying table data is memory based. In case > of a database with very large number of table/partitions, such iterator may > cause HS2 process to go OOM. > Also introduces a config option to run data copy tasks during repl load > operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457124&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457124 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:28 Start Date: 10/Jul/20 12:28 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r452811664 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -591,14 +590,25 @@ private Long incrementalDump(Path dumpRoot, DumpMetaData dmd, Path cmRoot, Hive } } dumpTableListToDumpLocation(tableList, dumpRoot, dbName, conf); - extTableCopyWorks = dirLocationsToCopy(extTableLocations); } -work.setDirCopyIterator(extTableCopyWorks.iterator()); -work.setManagedTableCopyPathIterator(managedTableCopyPaths.iterator()); +setDataCopyIterators(extTableFileList, managedTblList); work.getMetricCollector().reportStageEnd(getName(), Status.SUCCESS, lastReplId); return lastReplId; } + private void setDataCopyIterators(FileList extTableFileList, FileList managedTableFileList) throws IOException { +boolean dataCopyAtLoad = conf.getBoolVar(HiveConf.ConfVars.REPL_DATA_COPY_LAZY); +extTableFileList.close(); Review comment: Close make sure that every thing is flushed out and the list can be used in READ mode This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457124) Time Spent: 1h 20m (was: 1h 10m) > Memory efficient iterator should be used during replication. > > > Key: HIVE-23069 > URL: https://issues.apache.org/jira/browse/HIVE-23069 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23069.01.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently the iterator used while copying table data is memory based. In case > of a database with very large number of table/partitions, such iterator may > cause HS2 process to go OOM. > Also introduces a config option to run data copy tasks during repl load > operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457122&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457122 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:26 Start Date: 10/Jul/20 12:26 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r452810700 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileList.java ## @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.repl.util; + +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.BufferedReader; +import java.io.Closeable; +import java.io.IOException; +import java.io.InputStreamReader; +import java.util.Iterator; +import java.util.NoSuchElementException; +import java.util.concurrent.LinkedBlockingQueue; + + +/** + * A file backed list of Strings which is in-memory till the threshold. + */ +public class FileList implements Closeable, Iterator { + private static final Logger LOG = LoggerFactory.getLogger(FileList.class); + private static int fileListStreamerID = 0; + private static final String FILE_LIST_STREAMER_PREFIX = "file-list-streamer-"; + + private LinkedBlockingQueue cache; + private volatile boolean thresholdHit = false; + private int thresholdPoint; Review comment: thresholdHit is a boolean which once hit is used to take action. thresholdPoint is a value after which thresholdHit is set. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457122) Time Spent: 1h 10m (was: 1h) > Memory efficient iterator should be used during replication. > > > Key: HIVE-23069 > URL: https://issues.apache.org/jira/browse/HIVE-23069 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23069.01.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently the iterator used while copying table data is memory based. In case > of a database with very large number of table/partitions, such iterator may > cause HS2 process to go OOM. > Also introduces a config option to run data copy tasks during repl load > operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457118&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457118 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:24 Start Date: 10/Jul/20 12:24 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r452810053 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -465,9 +463,13 @@ private Long incrementalDump(Path dumpRoot, DumpMetaData dmd, Path cmRoot, Hive String validTxnList = null; long waitUntilTime = 0; long bootDumpBeginReplId = -1; -List managedTableCopyPaths = Collections.emptyList(); -List extTableCopyWorks = Collections.emptyList(); + +int cacheSize = conf.getIntVar(HiveConf.ConfVars.REPL_FILE_LIST_CACHE_SIZE); Review comment: Cache is rebuilt and in case of file, it should be overwritten This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457118) Time Spent: 1h (was: 50m) > Memory efficient iterator should be used during replication. > > > Key: HIVE-23069 > URL: https://issues.apache.org/jira/browse/HIVE-23069 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23069.01.patch > > Time Spent: 1h > Remaining Estimate: 0h > > Currently the iterator used while copying table data is memory based. In case > of a database with very large number of table/partitions, such iterator may > cause HS2 process to go OOM. > Also introduces a config option to run data copy tasks during repl load > operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457115&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457115 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:23 Start Date: 10/Jul/20 12:23 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r452809466 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/FileOperations.java ## @@ -165,4 +175,92 @@ private void validateSrcPathListExists() throws IOException, LoginException { throw new FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage())); } } + + /** + * This needs the root data directory to which the data needs to be exported to. + * The data export here is a list of files either in table/partition that are written to the _files + * in the exportRootDataDir provided. + */ + private void exportFilesAsList() throws SemanticException, IOException, LoginException { +if (dataPathList.isEmpty()) { + return; +} +boolean done = false; +int repeat = 0; +while (!done) { + // This is only called for replication that handles MM tables; no need for mmCtx. + try (BufferedWriter writer = writer()) { +for (Path dataPath : dataPathList) { + writeFilesList(listFilesInDir(dataPath), writer, AcidUtils.getAcidSubDir(dataPath)); +} +done = true; + } catch (IOException e) { +if (e instanceof FileNotFoundException) { + logger.error("exporting data files in dir : " + dataPathList + " to " + exportRootDataDir + " failed"); + throw new FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage())); +} +repeat++; +logger.info("writeFilesList failed", e); +if (repeat >= FileUtils.MAX_IO_ERROR_RETRY) { + logger.error("exporting data files in dir : " + dataPathList + " to " + exportRootDataDir + " failed"); + throw new IOException(ErrorMsg.REPL_FILE_SYSTEM_OPERATION_RETRY.getMsg()); +} + +int sleepTime = FileUtils.getSleepTime(repeat - 1); +logger.info(" sleep for {} milliseconds for retry num {} ", sleepTime , repeat); +try { + Thread.sleep(sleepTime); +} catch (InterruptedException timerEx) { + logger.info("thread sleep interrupted", timerEx.getMessage()); +} + +// in case of io error, reset the file system object +FileSystem.closeAllForUGI(Utils.getUGI()); +dataFileSystem = dataPathList.get(0).getFileSystem(hiveConf); +exportFileSystem = exportRootDataDir.getFileSystem(hiveConf); +Path exportPath = new Path(exportRootDataDir, EximUtil.FILES_NAME); +if (exportFileSystem.exists(exportPath)) { + exportFileSystem.delete(exportPath, true); +} + } +} + } + + private void writeFilesList(FileStatus[] fileStatuses, BufferedWriter writer, String encodedSubDirs) + throws IOException { +ReplChangeManager replChangeManager = ReplChangeManager.getInstance(); Review comment: Can you please elaborate? Didn't get which parameter you are referring to. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457115) Time Spent: 50m (was: 40m) > Memory efficient iterator should be used during replication. > > > Key: HIVE-23069 > URL: https://issues.apache.org/jira/browse/HIVE-23069 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23069.01.patch > > Time Spent: 50m > Remaining Estimate: 0h > > Currently the iterator used while copying table data is memory based. In case > of a database with very large number of table/partitions, such iterator may > cause HS2 process to go OOM. > Also introduces a config option to run data copy tasks during repl load > operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457113&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457113 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:22 Start Date: 10/Jul/20 12:22 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r452808706 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/FileOperations.java ## @@ -165,4 +175,92 @@ private void validateSrcPathListExists() throws IOException, LoginException { throw new FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage())); } } + + /** + * This needs the root data directory to which the data needs to be exported to. + * The data export here is a list of files either in table/partition that are written to the _files + * in the exportRootDataDir provided. + */ + private void exportFilesAsList() throws SemanticException, IOException, LoginException { +if (dataPathList.isEmpty()) { + return; +} +boolean done = false; +int repeat = 0; +while (!done) { + // This is only called for replication that handles MM tables; no need for mmCtx. + try (BufferedWriter writer = writer()) { +for (Path dataPath : dataPathList) { + writeFilesList(listFilesInDir(dataPath), writer, AcidUtils.getAcidSubDir(dataPath)); +} +done = true; + } catch (IOException e) { +if (e instanceof FileNotFoundException) { + logger.error("exporting data files in dir : " + dataPathList + " to " + exportRootDataDir + " failed"); + throw new FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage())); +} +repeat++; +logger.info("writeFilesList failed", e); +if (repeat >= FileUtils.MAX_IO_ERROR_RETRY) { + logger.error("exporting data files in dir : " + dataPathList + " to " + exportRootDataDir + " failed"); + throw new IOException(ErrorMsg.REPL_FILE_SYSTEM_OPERATION_RETRY.getMsg()); +} + +int sleepTime = FileUtils.getSleepTime(repeat - 1); +logger.info(" sleep for {} milliseconds for retry num {} ", sleepTime , repeat); +try { + Thread.sleep(sleepTime); +} catch (InterruptedException timerEx) { + logger.info("thread sleep interrupted", timerEx.getMessage()); +} + +// in case of io error, reset the file system object +FileSystem.closeAllForUGI(Utils.getUGI()); +dataFileSystem = dataPathList.get(0).getFileSystem(hiveConf); +exportFileSystem = exportRootDataDir.getFileSystem(hiveConf); +Path exportPath = new Path(exportRootDataDir, EximUtil.FILES_NAME); +if (exportFileSystem.exists(exportPath)) { + exportFileSystem.delete(exportPath, true); +} + } +} + } + + private void writeFilesList(FileStatus[] fileStatuses, BufferedWriter writer, String encodedSubDirs) + throws IOException { +ReplChangeManager replChangeManager = ReplChangeManager.getInstance(); +for (FileStatus fileStatus : fileStatuses) { + if (fileStatus.isDirectory()) { +// Write files inside the sub-directory. +Path subDir = fileStatus.getPath(); +writeFilesList(listFilesInDir(subDir), writer, encodedSubDir(encodedSubDirs, subDir)); + } else { +writer.write(encodedUri(replChangeManager, fileStatus, encodedSubDirs)); +writer.newLine(); + } +} + } + + private BufferedWriter writer() throws IOException { +Path exportToFile = new Path(exportRootDataDir, EximUtil.FILES_NAME); +logger.debug("exporting data files in dir : " + dataPathList + " to " + exportToFile); +return new BufferedWriter( +new OutputStreamWriter(exportFileSystem.create(exportToFile)) +); + } + + private String encodedSubDir(String encodedParentDirs, Path subDir) { +if (null == encodedParentDirs) { + return subDir.getName(); +} else { + return encodedParentDirs + Path.SEPARATOR + subDir.getName(); +} + } + + private String encodedUri(ReplChangeManager replChangeManager, FileStatus fileStatus, String encodedSubDir) + throws IOException { +Path currentDataFilePath = fileStatus.getPath(); +String checkSum = ReplChangeManager.checksumFor(currentDataFilePath, dataFileSystem); Review comment: Which method? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking -
[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore
[ https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457109&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457109 ] ASF GitHub Bot logged work on HIVE-22015: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:16 Start Date: 10/Jul/20 12:16 Worklog Time Spent: 10m Work Description: sankarh commented on a change in pull request #1109: URL: https://github.com/apache/hive/pull/1109#discussion_r452806379 ## File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java ## @@ -1754,6 +1760,16 @@ public void testForeignKeys() { Assert.assertEquals(cachedKeys.get(0).getFkcolumn_name(), "col2"); Assert.assertEquals(cachedKeys.get(0).getCatName(), DEFAULT_CATALOG_NAME); +cachedKeys = sharedCache.listCachedForeignKeys( +DEFAULT_CATALOG_NAME, tbl.getDbName(), tbl.getTableName(), tbl1.getDbName(), tbl1.getTableName()); + +Assert.assertEquals(cachedKeys.size(), 1); +Assert.assertEquals(cachedKeys.get(0).getFk_name(), "fk2"); +Assert.assertEquals(cachedKeys.get(0).getFktable_db(), "db"); +Assert.assertEquals(cachedKeys.get(0).getFktable_name(), tbl.getTableName()); +Assert.assertEquals(cachedKeys.get(0).getFkcolumn_name(), "col1"); Review comment: Also validate if parent tbl key is proper too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457109) Time Spent: 4h (was: 3h 50m) > [CachedStore] Cache table constraints in CachedStore > > > Key: HIVE-22015 > URL: https://issues.apache.org/jira/browse/HIVE-22015 > Project: Hive > Issue Type: Sub-task >Reporter: Daniel Dai >Assignee: Adesh Kumar Rao >Priority: Major > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > Currently table constraints are not cached. Hive will pull all constraints > from tables involved in query, which results multiple db reads (including > get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort > to cache this is small as it's just another table component. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore
[ https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457110&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457110 ] ASF GitHub Bot logged work on HIVE-22015: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:16 Start Date: 10/Jul/20 12:16 Worklog Time Spent: 10m Work Description: sankarh commented on a change in pull request #1109: URL: https://github.com/apache/hive/pull/1109#discussion_r452806379 ## File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java ## @@ -1754,6 +1760,16 @@ public void testForeignKeys() { Assert.assertEquals(cachedKeys.get(0).getFkcolumn_name(), "col2"); Assert.assertEquals(cachedKeys.get(0).getCatName(), DEFAULT_CATALOG_NAME); +cachedKeys = sharedCache.listCachedForeignKeys( +DEFAULT_CATALOG_NAME, tbl.getDbName(), tbl.getTableName(), tbl1.getDbName(), tbl1.getTableName()); + +Assert.assertEquals(cachedKeys.size(), 1); +Assert.assertEquals(cachedKeys.get(0).getFk_name(), "fk2"); +Assert.assertEquals(cachedKeys.get(0).getFktable_db(), "db"); +Assert.assertEquals(cachedKeys.get(0).getFktable_name(), tbl.getTableName()); +Assert.assertEquals(cachedKeys.get(0).getFkcolumn_name(), "col1"); Review comment: Also validate if parent tbl name is proper too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457110) Time Spent: 4h 10m (was: 4h) > [CachedStore] Cache table constraints in CachedStore > > > Key: HIVE-22015 > URL: https://issues.apache.org/jira/browse/HIVE-22015 > Project: Hive > Issue Type: Sub-task >Reporter: Daniel Dai >Assignee: Adesh Kumar Rao >Priority: Major > Labels: pull-request-available > Time Spent: 4h 10m > Remaining Estimate: 0h > > Currently table constraints are not cached. Hive will pull all constraints > from tables involved in query, which results multiple db reads (including > get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort > to cache this is small as it's just another table component. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457108&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457108 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:12 Start Date: 10/Jul/20 12:12 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r452798321 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -591,14 +590,25 @@ private Long incrementalDump(Path dumpRoot, DumpMetaData dmd, Path cmRoot, Hive } } dumpTableListToDumpLocation(tableList, dumpRoot, dbName, conf); - extTableCopyWorks = dirLocationsToCopy(extTableLocations); } -work.setDirCopyIterator(extTableCopyWorks.iterator()); -work.setManagedTableCopyPathIterator(managedTableCopyPaths.iterator()); +setDataCopyIterators(extTableFileList, managedTblList); work.getMetricCollector().reportStageEnd(getName(), Status.SUCCESS, lastReplId); return lastReplId; } + private void setDataCopyIterators(FileList extTableFileList, FileList managedTableFileList) throws IOException { +boolean dataCopyAtLoad = conf.getBoolVar(HiveConf.ConfVars.REPL_DATA_COPY_LAZY); +extTableFileList.close(); Review comment: Is this serving the purpose of flush? Its not clear why close is called before setting the iterator. Needs to be simplified. ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileList.java ## @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.repl.util; + +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.BufferedReader; +import java.io.Closeable; +import java.io.IOException; +import java.io.InputStreamReader; +import java.util.Iterator; +import java.util.NoSuchElementException; +import java.util.concurrent.LinkedBlockingQueue; + + +/** + * A file backed list of Strings which is in-memory till the threshold. + */ +public class FileList implements Closeable, Iterator { Review comment: Add UTs ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileListStreamer.java ## @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.repl.util; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.BufferedWriter; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStreamWriter; +import java.util.concurrent.LinkedBlockingQueue; +import java.util.concurrent.TimeUnit; + +public class FileListStreamer extends Thread implements Closeable { + private static final Logger LOG = LoggerFactory.getLogger(FileListStreamer.class); + private static final long TIMEOUT_IN_SECS = 5L; + private volatile boolean stop; + private final Lin
[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore
[ https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457105&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457105 ] ASF GitHub Bot logged work on HIVE-22015: - Author: ASF GitHub Bot Created on: 10/Jul/20 12:01 Start Date: 10/Jul/20 12:01 Worklog Time Spent: 10m Work Description: adesh-rao commented on a change in pull request #1109: URL: https://github.com/apache/hive/pull/1109#discussion_r452799879 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java ## @@ -261,44 +283,57 @@ public int getObjectSize(Class clazz, Object obj) { private Map parameters; private byte[] sdHash; private int otherSize; -private int tableColStatsCacheSize; -private int partitionCacheSize; -private int partitionColStatsCacheSize; -private int aggrColStatsCacheSize; + +// Arrays to hold the size/updated bit of cached objects. +// These arrays are to be referenced using MemberName enum only. +private int[] memberObjectsSize = new int[MemberName.values().length]; +private AtomicBoolean[] memberCacheUpdated = new AtomicBoolean[MemberName.values().length]; private ReentrantReadWriteLock tableLock = new ReentrantReadWriteLock(true); // For caching column stats for an unpartitioned table // Key is column name and the value is the col stat object private Map tableColStatsCache = new ConcurrentHashMap(); -private AtomicBoolean isTableColStatsCacheDirty = new AtomicBoolean(false); // For caching partition objects // Ket is partition values and the value is a wrapper around the partition object private Map partitionCache = new ConcurrentHashMap(); -private AtomicBoolean isPartitionCacheDirty = new AtomicBoolean(false); // For caching column stats for a partitioned table // Key is aggregate of partition values, column name and the value is the col stat object private Map partitionColStatsCache = new ConcurrentHashMap(); -private AtomicBoolean isPartitionColStatsCacheDirty = new AtomicBoolean(false); // For caching aggregate column stats for all and all minus default partition // Key is column name and the value is a list of 2 col stat objects // (all partitions and all but default) private Map> aggrColStatsCache = new ConcurrentHashMap>(); -private AtomicBoolean isAggrPartitionColStatsCacheDirty = new AtomicBoolean(false); + +private Map primaryKeyCache = new ConcurrentHashMap<>(); + +private Map foreignKeyCache = new ConcurrentHashMap<>(); + +private Map notNullConstraintCache = new ConcurrentHashMap<>(); + +private Map uniqueConstraintCache = new ConcurrentHashMap<>(); TableWrapper(Table t, byte[] sdHash, String location, Map parameters) { this.t = t; this.sdHash = sdHash; this.location = location; this.parameters = parameters; - this.tableColStatsCacheSize = 0; - this.partitionCacheSize = 0; - this.partitionColStatsCacheSize = 0; - this.aggrColStatsCacheSize = 0; + for(MemberName mn : MemberName.values()) { +this.memberObjectsSize[mn.getValue()] = 0; Review comment: Java treats enum as objects. Array indexes can be integers only. Therefore, I have to use mn.getValue() only. PS: Enum also provides `ordinal` method that returns the position of enum member, but that can cause issues if order is changed. So, I decided to go ahead with creating own getValue() method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457105) Time Spent: 3h 50m (was: 3h 40m) > [CachedStore] Cache table constraints in CachedStore > > > Key: HIVE-22015 > URL: https://issues.apache.org/jira/browse/HIVE-22015 > Project: Hive > Issue Type: Sub-task >Reporter: Daniel Dai >Assignee: Adesh Kumar Rao >Priority: Major > Labels: pull-request-available > Time Spent: 3h 50m > Remaining Estimate: 0h > > Currently table constraints are not cached. Hive will pull all constraints > from tables involved in query, which results multiple db reads (including > get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort > to cache this is small as it's just another table component. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23237) Display HiveServer2 hostname in the operation logs
[ https://issues.apache.org/jira/browse/HIVE-23237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155396#comment-17155396 ] Zhihua Deng commented on HIVE-23237: This may resolved at https://issues.apache.org/jira/browse/HIVE-23722 > Display HiveServer2 hostname in the operation logs > -- > > Key: HIVE-23237 > URL: https://issues.apache.org/jira/browse/HIVE-23237 > Project: Hive > Issue Type: Improvement >Reporter: Miklos Szurap >Priority: Major > Labels: supportability > > Hive deployments often have an external load-balancer in front of multiple > HiveServer2 instances. > In such cases the client does not know which HiveServer2 it is connected to. > If there are some issues all HiveServer2 logs have to be searched for clues > instead of directly going to the right host. It would be great if the HS2 > hostname was logged to the client logs (for example to beeline's output). > We can "work around" by printing out this information with executing a "set > hive.server2.thrift.bind.host;" however that requires an explicit > modification to every application. > Can we print this information in the operation logs and that way streaming it > back to the client? > Likely some users - customers do not want to expose that, so the behavior > should be configurable. > This could make the issue/error investigation much easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-20441) NPE in ExprNodeGenericFuncDesc when hive.allow.udf.load.on.demand is set to true
[ https://issues.apache.org/jira/browse/HIVE-20441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153437#comment-17153437 ] Zhihua Deng edited comment on HIVE-20441 at 7/10/20, 11:38 AM: --- ...The problem may still be there in the trunk, [~BIGrey] are you still working on this ? if not, can I take over? thanks! was (Author: dengzh): ...The problem may still be there in the trunk, [~BIGrey] are you still working on this ? > NPE in ExprNodeGenericFuncDesc when hive.allow.udf.load.on.demand is set to > true > - > > Key: HIVE-20441 > URL: https://issues.apache.org/jira/browse/HIVE-20441 > Project: Hive > Issue Type: Bug > Components: CLI, HiveServer2 >Affects Versions: 1.2.1, 2.3.3 >Reporter: Hui Huang >Assignee: Hui Huang >Priority: Major > Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, > HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch > > > When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been > started, the new created function from other clients or hiveserver2 will be > loaded from the metastore at the first time. > When the udf is used in where clause, we got a NPE like: > {code:java} > Error executing statement: > org.apache.hive.service.cli.HiveSQLException: Error while compiling > statement: FAILED: NullPointerException null > at > org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP > SHOT] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO > T] > at > org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:542) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA > PSHOT] > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA > PSHOT] > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:57) > ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [?:1.8.0_77] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [?:1.8.0_77] > at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77] > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:236) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:1104) > ~[hive-exec-2. > 3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1359) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2. > 3.4-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) > ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT] >
[jira] [Assigned] (HIVE-23835) Repl Dump should dump function binaries to staging directory
[ https://issues.apache.org/jira/browse/HIVE-23835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Sinha reassigned HIVE-23835: --- > Repl Dump should dump function binaries to staging directory > > > Key: HIVE-23835 > URL: https://issues.apache.org/jira/browse/HIVE-23835 > Project: Hive > Issue Type: Task >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > > {color:#172b4d}When hive function's binaries are on source HDFS, repl dump > should dump it to the staging location in order to break cross clusters > visibility requirement.{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23760) Upgrading to Kafka 2.5 Clients
[ https://issues.apache.org/jira/browse/HIVE-23760?focusedWorklogId=457097&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457097 ] ASF GitHub Bot logged work on HIVE-23760: - Author: ASF GitHub Bot Created on: 10/Jul/20 11:27 Start Date: 10/Jul/20 11:27 Worklog Time Spent: 10m Work Description: klcopp merged pull request #1216: URL: https://github.com/apache/hive/pull/1216 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457097) Time Spent: 2h (was: 1h 50m) > Upgrading to Kafka 2.5 Clients > -- > > Key: HIVE-23760 > URL: https://issues.apache.org/jira/browse/HIVE-23760 > Project: Hive > Issue Type: Improvement > Components: kafka integration >Reporter: Andras Katona >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore
[ https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457096&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457096 ] ASF GitHub Bot logged work on HIVE-22015: - Author: ASF GitHub Bot Created on: 10/Jul/20 11:11 Start Date: 10/Jul/20 11:11 Worklog Time Spent: 10m Work Description: adesh-rao commented on a change in pull request #1109: URL: https://github.com/apache/hive/pull/1109#discussion_r452779486 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java ## @@ -470,6 +484,107 @@ boolean cachePartitions(Iterable parts, SharedCache sharedCache, bool } } +boolean cachePrimaryKeys(List primaryKeys, boolean fromPrewarm) { + return cacheConstraints(primaryKeys, fromPrewarm, MemberName.PRIMARY_KEY_CACHE); +} + +boolean cacheForeignKeys(List foreignKeys, boolean fromPrewarm) { + return cacheConstraints(foreignKeys, fromPrewarm, MemberName.FOREIGN_KEY_CACHE); +} + +boolean cacheUniqueConstraints(List uniqueConstraints, boolean fromPrewarm) { + return cacheConstraints(uniqueConstraints, fromPrewarm, MemberName.UNIQUE_CONSTRAINT_CACHE); +} + +boolean cacheNotNullConstraints(List notNullConstraints, boolean fromPrewarm) { + return cacheConstraints(notNullConstraints, fromPrewarm, MemberName.NOTNULL_CONSTRAINT_CACHE); +} + +// Common method to cache constraints +private boolean cacheConstraints(List constraintsList, + boolean fromPrewarm, + MemberName mn) { + if (constraintsList == null || constraintsList.isEmpty()) { +return true; + } + try { +tableLock.writeLock().lock(); +final int[] size = {0}; Review comment: This is being used inside lambda function. It requires the variable to be used as final. Because of this, I can't use int or Integer. So I chose int array instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457096) Time Spent: 3h 40m (was: 3.5h) > [CachedStore] Cache table constraints in CachedStore > > > Key: HIVE-22015 > URL: https://issues.apache.org/jira/browse/HIVE-22015 > Project: Hive > Issue Type: Sub-task >Reporter: Daniel Dai >Assignee: Adesh Kumar Rao >Priority: Major > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently table constraints are not cached. Hive will pull all constraints > from tables involved in query, which results multiple db reads (including > get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort > to cache this is small as it's just another table component. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore
[ https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457095&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457095 ] ASF GitHub Bot logged work on HIVE-22015: - Author: ASF GitHub Bot Created on: 10/Jul/20 11:07 Start Date: 10/Jul/20 11:07 Worklog Time Spent: 10m Work Description: adesh-rao commented on a change in pull request #1109: URL: https://github.com/apache/hive/pull/1109#discussion_r45293 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java ## @@ -2490,26 +2616,99 @@ long getPartsFound() { @Override public List getPrimaryKeys(String catName, String dbName, String tblName) throws MetaException { -// TODO constraintCache -return rawStore.getPrimaryKeys(catName, dbName, tblName); +catName = normalizeIdentifier(catName); +dbName = StringUtils.normalizeIdentifier(dbName); +tblName = StringUtils.normalizeIdentifier(tblName); +if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && rawStore.isActiveTransaction())) { + return rawStore.getPrimaryKeys(catName, dbName, tblName); +} + +Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName); +if (tbl == null) { + // The table containing the primary keys is not yet loaded in cache + return rawStore.getPrimaryKeys(catName, dbName, tblName); +} +List keys = sharedCache.listCachedPrimaryKeys(catName, dbName, tblName); +if (keys == null || keys.isEmpty()) { Review comment: Created a follow up jira. https://issues.apache.org/jira/browse/HIVE-23834 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457095) Time Spent: 3.5h (was: 3h 20m) > [CachedStore] Cache table constraints in CachedStore > > > Key: HIVE-22015 > URL: https://issues.apache.org/jira/browse/HIVE-22015 > Project: Hive > Issue Type: Sub-task >Reporter: Daniel Dai >Assignee: Adesh Kumar Rao >Priority: Major > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > > Currently table constraints are not cached. Hive will pull all constraints > from tables involved in query, which results multiple db reads (including > get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort > to cache this is small as it's just another table component. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23834) [CachedStore] Add flag in TableWrapper in CacheStore to check if constraints are set or not
[ https://issues.apache.org/jira/browse/HIVE-23834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adesh Kumar Rao reassigned HIVE-23834: -- > [CachedStore] Add flag in TableWrapper in CacheStore to check if constraints > are set or not > --- > > Key: HIVE-23834 > URL: https://issues.apache.org/jira/browse/HIVE-23834 > Project: Hive > Issue Type: Sub-task > Components: Standalone Metastore >Reporter: Adesh Kumar Rao >Assignee: Adesh Kumar Rao >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore
[ https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457093&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457093 ] ASF GitHub Bot logged work on HIVE-22015: - Author: ASF GitHub Bot Created on: 10/Jul/20 11:02 Start Date: 10/Jul/20 11:02 Worklog Time Spent: 10m Work Description: sankarh commented on a change in pull request #1109: URL: https://github.com/apache/hive/pull/1109#discussion_r452771399 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java ## @@ -514,6 +629,131 @@ public boolean containsPartition(List partVals) { return containsPart; } +public void removeConstraint(String name) { + try { +tableLock.writeLock().lock(); +Object constraint = null; +MemberName mn = null; +Class constraintClass = null; +name = name.toLowerCase(); +if (this.primaryKeyCache.containsKey(name)) { + constraint = this.primaryKeyCache.remove(name); + mn = MemberName.PRIMARY_KEY_CACHE; + constraintClass = SQLPrimaryKey.class; +} else if (this.foreignKeyCache.containsKey(name)) { + constraint = this.foreignKeyCache.remove(name); + mn = MemberName.FOREIGN_KEY_CACHE; + constraintClass = SQLForeignKey.class; +} else if (this.notNullConstraintCache.containsKey(name)) { + constraint = this.notNullConstraintCache.remove(name); + mn = MemberName.NOTNULL_CONSTRAINT_CACHE; + constraintClass = SQLNotNullConstraint.class; +} else if (this.uniqueConstraintCache.containsKey(name)) { + constraint = this.uniqueConstraintCache.remove(name); + mn = MemberName.UNIQUE_CONSTRAINT_CACHE; + constraintClass = SQLUniqueConstraint.class; +} + +if(constraint == null) { + LOG.debug("Constraint: " + name + " does not exist in cache."); + return; +} +setMemberCacheUpdated(mn, true); +int size = getObjectSize(constraintClass, constraint); +updateMemberSize(mn, -1 * size, SizeMode.Delta); + } finally { +tableLock.writeLock().unlock(); + } +} + +public void refreshPrimaryKeys(List keys) { + Map newKeys = new ConcurrentHashMap<>(); + try { +tableLock.writeLock().lock(); +int size = 0; +for (SQLPrimaryKey key : keys) { + if (compareAndSetMemberCacheUpdated(MemberName.PRIMARY_KEY_CACHE, true, false)) { +LOG.debug("Skipping primary key cache update for table: " + getTable().getTableName() ++ "; the primary keys we have is dirty."); +return; + } + newKeys.put(key.getPk_name().toLowerCase(), key); + size += getObjectSize(SQLPrimaryKey.class, key); +} +primaryKeyCache = newKeys; +updateMemberSize(MemberName.PRIMARY_KEY_CACHE, size, SizeMode.Snapshot); +LOG.debug("Primary keys refresh in cache was successful."); Review comment: Shall add catalog, db and table names in the log msg otherwise this is no use. Same for other methods too. ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java ## @@ -2490,26 +2616,99 @@ long getPartsFound() { @Override public List getPrimaryKeys(String catName, String dbName, String tblName) throws MetaException { -// TODO constraintCache -return rawStore.getPrimaryKeys(catName, dbName, tblName); +catName = normalizeIdentifier(catName); +dbName = StringUtils.normalizeIdentifier(dbName); +tblName = StringUtils.normalizeIdentifier(tblName); +if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && rawStore.isActiveTransaction())) { + return rawStore.getPrimaryKeys(catName, dbName, tblName); +} + +Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName); +if (tbl == null) { + // The table containing the primary keys is not yet loaded in cache + return rawStore.getPrimaryKeys(catName, dbName, tblName); +} +List keys = sharedCache.listCachedPrimaryKeys(catName, dbName, tblName); +if (keys == null || keys.isEmpty()) { Review comment: Can we have a flag in TableWrapper in Cache to tell if it was set or not? Can be a follow-up jira. ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java ## @@ -470,6 +484,107 @@ boolean cachePartitions(Iterable parts, SharedCache sharedCache, bool } } +boolean cachePrimaryKeys(List primaryKeys, boolean fromPrewarm) { + return cacheConstraints(primaryKeys, fromPrewarm, MemberName.PRIMARY_KEY_CACHE); +} + +boolean cacheForeignKeys(L
[jira] [Work started] (HIVE-23695) [CachedStore] Add unique/default constraints in CachedStore
[ https://issues.apache.org/jira/browse/HIVE-23695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-23695 started by Ashish Sharma. > [CachedStore] Add unique/default constraints in CachedStore > --- > > Key: HIVE-23695 > URL: https://issues.apache.org/jira/browse/HIVE-23695 > Project: Hive > Issue Type: Sub-task > Components: Standalone Metastore >Reporter: Adesh Kumar Rao >Assignee: Ashish Sharma >Priority: Major > Fix For: 4.0.0 > > > This is blocked by HIVE-23618 (notification events are not generated for > default/unique constraints, hence created a separate sub-task from > HIVE-22015). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cancel background
[ https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=457080&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457080 ] ASF GitHub Bot logged work on HIVE-23727: - Author: ASF GitHub Bot Created on: 10/Jul/20 09:45 Start Date: 10/Jul/20 09:45 Worklog Time Spent: 10m Work Description: dengzhhu653 edited a comment on pull request #1149: URL: https://github.com/apache/hive/pull/1149#issuecomment-656586562 @belugabehr @kgyrtkirk could you please review? a small fix on the log output, thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457080) Time Spent: 2h 40m (was: 2.5h) > Improve SQLOperation log handling when cancel background > > > Key: HIVE-23727 > URL: https://issues.apache.org/jira/browse/HIVE-23727 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > The SQLOperation checks _if (shouldRunAsync() && state != > OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the > background task. If true, the state should not be OperationState.CANCELED, so > logging under the state == OperationState.CANCELED should never happen. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cancel background
[ https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=457077&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457077 ] ASF GitHub Bot logged work on HIVE-23727: - Author: ASF GitHub Bot Created on: 10/Jul/20 09:41 Start Date: 10/Jul/20 09:41 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on pull request #1149: URL: https://github.com/apache/hive/pull/1149#issuecomment-656586562 @belugabehr @kgyrtkirk could you please review? a small fix on the log output. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457077) Time Spent: 2.5h (was: 2h 20m) > Improve SQLOperation log handling when cancel background > > > Key: HIVE-23727 > URL: https://issues.apache.org/jira/browse/HIVE-23727 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > The SQLOperation checks _if (shouldRunAsync() && state != > OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the > background task. If true, the state should not be OperationState.CANCELED, so > logging under the state == OperationState.CANCELED should never happen. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cancel background
[ https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=457076&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457076 ] ASF GitHub Bot logged work on HIVE-23727: - Author: ASF GitHub Bot Created on: 10/Jul/20 09:40 Start Date: 10/Jul/20 09:40 Worklog Time Spent: 10m Work Description: dengzhhu653 removed a comment on pull request #1149: URL: https://github.com/apache/hive/pull/1149#issuecomment-648507858 @belugabehr can you please take a look at the changes? thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457076) Time Spent: 2h 20m (was: 2h 10m) > Improve SQLOperation log handling when cancel background > > > Key: HIVE-23727 > URL: https://issues.apache.org/jira/browse/HIVE-23727 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > The SQLOperation checks _if (shouldRunAsync() && state != > OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the > background task. If true, the state should not be OperationState.CANCELED, so > logging under the state == OperationState.CANCELED should never happen. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22957) Support Partition Filtering In MSCK REPAIR TABLE Command
[ https://issues.apache.org/jira/browse/HIVE-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated HIVE-22957: - Attachment: (was: HIVE-22957.03.patch) > Support Partition Filtering In MSCK REPAIR TABLE Command > > > Key: HIVE-22957 > URL: https://issues.apache.org/jira/browse/HIVE-22957 > Project: Hive > Issue Type: Improvement >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf > > Time Spent: 4h 40m > Remaining Estimate: 0h > > *Design Doc:* > [^Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22957) Support Partition Filtering In MSCK REPAIR TABLE Command
[ https://issues.apache.org/jira/browse/HIVE-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated HIVE-22957: - Attachment: (was: HIVE-22957.02.patch) > Support Partition Filtering In MSCK REPAIR TABLE Command > > > Key: HIVE-22957 > URL: https://issues.apache.org/jira/browse/HIVE-22957 > Project: Hive > Issue Type: Improvement >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf > > Time Spent: 4h 40m > Remaining Estimate: 0h > > *Design Doc:* > [^Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete
[ https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated HIVE-23737: - Attachment: (was: HIVE-23737.01.patch) > LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's > dagDelete > --- > > Key: HIVE-23737 > URL: https://issues.apache.org/jira/browse/HIVE-23737 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez > have added support for dagDelete in custom shuffle handler (TEZ-3362) we > could re-use that feature in LLAP. > There are some added advantages of using Tez's dagDelete feature rather than > the current LLAP's dagDelete feature. > 1) We can easily extend this feature to accommodate the upcoming features > such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 > and TEZ-4129 > 2) It will be more easier to maintain this feature by separating it out from > the Hive's code path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete
[ https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated HIVE-23737: - Attachment: (was: HIVE-23737.02.patch) > LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's > dagDelete > --- > > Key: HIVE-23737 > URL: https://issues.apache.org/jira/browse/HIVE-23737 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez > have added support for dagDelete in custom shuffle handler (TEZ-3362) we > could re-use that feature in LLAP. > There are some added advantages of using Tez's dagDelete feature rather than > the current LLAP's dagDelete feature. > 1) We can easily extend this feature to accommodate the upcoming features > such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 > and TEZ-4129 > 2) It will be more easier to maintain this feature by separating it out from > the Hive's code path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22957) Support Partition Filtering In MSCK REPAIR TABLE Command
[ https://issues.apache.org/jira/browse/HIVE-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated HIVE-22957: - Attachment: (was: HIVE-22957.01.patch) > Support Partition Filtering In MSCK REPAIR TABLE Command > > > Key: HIVE-22957 > URL: https://issues.apache.org/jira/browse/HIVE-22957 > Project: Hive > Issue Type: Improvement >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf > > Time Spent: 4h 40m > Remaining Estimate: 0h > > *Design Doc:* > [^Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23806) Avoid clearing column stat states in all partition in case schema is extended
[ https://issues.apache.org/jira/browse/HIVE-23806?focusedWorklogId=457068&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457068 ] ASF GitHub Bot logged work on HIVE-23806: - Author: ASF GitHub Bot Created on: 10/Jul/20 09:14 Start Date: 10/Jul/20 09:14 Worklog Time Spent: 10m Work Description: adesh-rao commented on a change in pull request #1215: URL: https://github.com/apache/hive/pull/1215#discussion_r452724372 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java ## @@ -501,6 +502,28 @@ public static boolean areSameColumns(List oldCols, List p, List s) { +if (p == s) { + return true; +} +if (p.size() > s.size()) { + return false; +} +Iterator itP = p.iterator(); Review comment: @kgyrtkirk Maybe we can use `ListUtils.isEqualList(p, s.subList(0, p.size()))`? that way we can avoid most of the code here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457068) Time Spent: 0.5h (was: 20m) > Avoid clearing column stat states in all partition in case schema is extended > - > > Key: HIVE-23806 > URL: https://issues.apache.org/jira/browse/HIVE-23806 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > in case there are many partitions; adding a new column without cascade may > take a while - because we want to make sure in schema evolution cases that we > don't reuse stats later-on by mistake... > however this is not neccessary in case the schema is extended -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23806) Avoid clearing column stat states in all partition in case schema is extended
[ https://issues.apache.org/jira/browse/HIVE-23806?focusedWorklogId=457067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457067 ] ASF GitHub Bot logged work on HIVE-23806: - Author: ASF GitHub Bot Created on: 10/Jul/20 09:13 Start Date: 10/Jul/20 09:13 Worklog Time Spent: 10m Work Description: adesh-rao commented on a change in pull request #1215: URL: https://github.com/apache/hive/pull/1215#discussion_r452724372 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java ## @@ -501,6 +502,28 @@ public static boolean areSameColumns(List oldCols, List p, List s) { +if (p == s) { + return true; +} +if (p.size() > s.size()) { + return false; +} +Iterator itP = p.iterator(); Review comment: @kgyrtkirk Maybe we can use `ListUtils.isEqualList(p, p.subList(0, p.size()))`? that way we can avoid most of the code here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457067) Time Spent: 20m (was: 10m) > Avoid clearing column stat states in all partition in case schema is extended > - > > Key: HIVE-23806 > URL: https://issues.apache.org/jira/browse/HIVE-23806 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > in case there are many partitions; adding a new column without cascade may > take a while - because we want to make sure in schema evolution cases that we > don't reuse stats later-on by mistake... > however this is not neccessary in case the schema is extended -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23833) wrong explain and result when full join with join
[ https://issues.apache.org/jira/browse/HIVE-23833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chuanjie.duan updated HIVE-23833: - Description: Reproduce: # Create three tables, mytest_t1, mytest_t2, mytest_t4 # hive -e "explain select coalesce(t1.wh_guid,t2.wh_guid) as wh_guid from dw_dev.mytest_t1 t1 full join dw_dev.mytest_t2 t2 on t1.material_code = t2.material_code;" # hive -e "explain select coalesce(t1.wh_guid,t2.wh_guid) as wh_guid from dw_dev.mytest_t1 t1 full join dw_dev.mytest_t2 t2 on t1.material_code = t2.material_code {color:#ff}join dw_dev.mytest_t5 t5 on t5.material_code = coalesce(t1.material_code,t2.material_code){color};" # expect output row are over 6000, but actually get 685 rows 2 - explain Map Reduce Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: material_code (type: string), wh_guid (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: string) TableScan alias: t2 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: material_code (type: string), wh_guid (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: string) Reduce Operator Tree: Join Operator condition map: {color:#ff}Outer Join 0 to 1{color} keys: 0 _col0 (type: string) 1 _col0 (type: string) outputColumnNames: _col1, _col3 Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: COALESCE(_col1,_col3) (type: string) outputColumnNames: _col0 Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 3 - explain STAGE PLANS: Stage: Stage-7 Map Reduce Local Work Alias -> Map Local Tables: $hdt$_1:t2 Fetch Operator limit: -1 $hdt$_2:t5 Fetch Operator limit: -1 Alias -> Map Local Operator Tree: $hdt$_1:t2 TableScan alias: t2 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: material_code is not null (type: boolean) Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: material_code (type: string), wh_guid (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col0 (type: string) 1 _col0 (type: string) $hdt$_2:t5 TableScan alias: t5 Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: material_code is not null (type: boolean) Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: material_code (type: string) outputColumnNames: _col0 Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 COALESCE(_col0,_col2) (type: string) 1 _col0 (type: string) Stage: Stage-5 Map Reduce Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: material_code is not null (type: boolean) Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: material_code (type: string), wh_guid (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: {color:#FF} Inner Join 0 to 1{color} keys: 0 _col0 (type: string) 1 _col0 (type: string) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: {color:#FF} Inner Join 0 to 1{color} keys: 0 COA
[jira] [Updated] (HIVE-23800) Add hooks when HiveServer2 stops due to OutOfMemoryError
[ https://issues.apache.org/jira/browse/HIVE-23800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng updated HIVE-23800: --- Summary: Add hooks when HiveServer2 stops due to OutOfMemoryError (was: Make HiveServer2 oom hook interface) > Add hooks when HiveServer2 stops due to OutOfMemoryError > > > Key: HIVE-23800 > URL: https://issues.apache.org/jira/browse/HIVE-23800 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Make oom hook an interface of HiveServer2, so user can implement the hook to > do something before HS2 stops, such as dumping the heap or altering the > devops. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-23832) Compaction cleaner fails to clean up deltas when using blocking compaction
[ https://issues.apache.org/jira/browse/HIVE-23832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-23832 started by Denys Kuzmenko. - > Compaction cleaner fails to clean up deltas when using blocking compaction > -- > > Key: HIVE-23832 > URL: https://issues.apache.org/jira/browse/HIVE-23832 > Project: Hive > Issue Type: Bug >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > > {code} > CREATE TABLE default.compcleanup ( >cda_id int, >cda_run_id varchar(255), >cda_load_tstimestamp, >global_party_idstring, >group_id string) > COMMENT 'gp_2_gr' > PARTITIONED BY ( >cda_date int, >cda_job_name varchar(12)) > STORED AS ORC; > -- cda_date=20200601/cda_job_name=core_base > INSERT INTO default.compcleanup VALUES > (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base'); > SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name > = 'core_base'; > UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1; > SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name > = 'core_base'; > ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, > cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT; > {code} > When using blocking compaction Cleaner skips processing due to the presence > of open txn (by `ALTER TABLE`) below Compactor's one. > {code} > AcidUtils - getChildState() ignoring([]) > pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035 > {code} > AcidUtils.processBaseDir > {code} > if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, > validTxnList)) { >return; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-23618) NotificationLog should also contain events for default/check constraints
[ https://issues.apache.org/jira/browse/HIVE-23618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-23618 started by Adesh Kumar Rao. -- > NotificationLog should also contain events for default/check constraints > > > Key: HIVE-23618 > URL: https://issues.apache.org/jira/browse/HIVE-23618 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: 4.0.0 >Reporter: Adesh Kumar Rao >Assignee: Adesh Kumar Rao >Priority: Major > > This should follow similar approach of notNull/Unique constraints. This will > also include event replication for these constraints. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23695) [CachedStore] Add unique/default constraints in CachedStore
[ https://issues.apache.org/jira/browse/HIVE-23695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adesh Kumar Rao reassigned HIVE-23695: -- Assignee: Ashish Sharma (was: Adesh Kumar Rao) > [CachedStore] Add unique/default constraints in CachedStore > --- > > Key: HIVE-23695 > URL: https://issues.apache.org/jira/browse/HIVE-23695 > Project: Hive > Issue Type: Sub-task > Components: Standalone Metastore >Reporter: Adesh Kumar Rao >Assignee: Ashish Sharma >Priority: Major > Fix For: 4.0.0 > > > This is blocked by HIVE-23618 (notification events are not generated for > default/unique constraints, hence created a separate sub-task from > HIVE-22015). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23827) Upgrade to datasketches 1.1.0
[ https://issues.apache.org/jira/browse/HIVE-23827?focusedWorklogId=457059&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457059 ] ASF GitHub Bot logged work on HIVE-23827: - Author: ASF GitHub Bot Created on: 10/Jul/20 08:22 Start Date: 10/Jul/20 08:22 Worklog Time Spent: 10m Work Description: kgyrtkirk merged pull request #1233: URL: https://github.com/apache/hive/pull/1233 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457059) Time Spent: 20m (was: 10m) > Upgrade to datasketches 1.1.0 > - > > Key: HIVE-23827 > URL: https://issues.apache.org/jira/browse/HIVE-23827 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23827) Upgrade to datasketches 1.1.0
[ https://issues.apache.org/jira/browse/HIVE-23827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-23827. - Fix Version/s: 4.0.0 Resolution: Fixed pushed to master. Thank you Denys for reviewing the change! > Upgrade to datasketches 1.1.0 > - > > Key: HIVE-23827 > URL: https://issues.apache.org/jira/browse/HIVE-23827 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23509) MapJoin AssertionError: Capacity must be power of 2
[ https://issues.apache.org/jira/browse/HIVE-23509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-23509. - Fix Version/s: 4.0.0 Resolution: Fixed merged into master. Thank you [~spedamallu] for fixing this! > MapJoin AssertionError: Capacity must be power of 2 > --- > > Key: HIVE-23509 > URL: https://issues.apache.org/jira/browse/HIVE-23509 > Project: Hive > Issue Type: Bug > Environment: Hive-2.3.6 >Reporter: Shashank Pedamallu >Assignee: Shashank Pedamallu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Observed AssertionError errors in Hive query when rowCount for join is issued > as (2^x)+(2^(x+1)). > Following is the stacktrace: > {noformat} > [2020-05-11 05:43:12,135] {base_task_runner.py:95} INFO - Subtask: ERROR : > Vertex failed, vertexName=Map 4, vertexId=vertex_1588729523139_51702_1_06, > diagnostics=[Task failed, taskId=task_1588729523139_51702_1_06_001286, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1588729523139_51702_1_06_001286_0:java.lang.RuntimeException: > java.lang.AssertionError: Capacity must be a power of two [2020-05-11 > 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > java.security.AccessController.doPrivileged(Native Method) [2020-05-11 > 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > javax.security.auth.Subject.doAs(Subject.java:422) [2020-05-11 05:43:12,136] > {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > java.util.concurrent.FutureTask.run(FutureTask.java:266) [2020-05-11 > 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at > java.lang.Thread.run(Thread.java:748) [2020-05-11 05:43:12,137] > {base_task_runner.py:95} INFO - Subtask: Caused by: java.lang.AssertionError: > Capacity must be a power of two [2020-05-11 05:43:12,137] > {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.validateCapacity(BytesBytesMultiHashMap.java:552) > [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashImpl(BytesBytesMultiHashMap.java:731) > [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashToTarget(BytesBytesMultiHashMap.java:545) > [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$HashPartition.getHashMapFromDisk(HybridHashTableContainer.java:183) > [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.reloadHashTable(MapJoinOperator.java:641) > [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.continueProcess(MapJoinOper
[jira] [Work logged] (HIVE-23509) MapJoin AssertionError: Capacity must be power of 2
[ https://issues.apache.org/jira/browse/HIVE-23509?focusedWorklogId=457057&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457057 ] ASF GitHub Bot logged work on HIVE-23509: - Author: ASF GitHub Bot Created on: 10/Jul/20 08:21 Start Date: 10/Jul/20 08:21 Worklog Time Spent: 10m Work Description: kgyrtkirk merged pull request #1026: URL: https://github.com/apache/hive/pull/1026 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457057) Time Spent: 20m (was: 10m) > MapJoin AssertionError: Capacity must be power of 2 > --- > > Key: HIVE-23509 > URL: https://issues.apache.org/jira/browse/HIVE-23509 > Project: Hive > Issue Type: Bug > Environment: Hive-2.3.6 >Reporter: Shashank Pedamallu >Assignee: Shashank Pedamallu >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Observed AssertionError errors in Hive query when rowCount for join is issued > as (2^x)+(2^(x+1)). > Following is the stacktrace: > {noformat} > [2020-05-11 05:43:12,135] {base_task_runner.py:95} INFO - Subtask: ERROR : > Vertex failed, vertexName=Map 4, vertexId=vertex_1588729523139_51702_1_06, > diagnostics=[Task failed, taskId=task_1588729523139_51702_1_06_001286, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1588729523139_51702_1_06_001286_0:java.lang.RuntimeException: > java.lang.AssertionError: Capacity must be a power of two [2020-05-11 > 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > java.security.AccessController.doPrivileged(Native Method) [2020-05-11 > 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > javax.security.auth.Subject.doAs(Subject.java:422) [2020-05-11 05:43:12,136] > {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > java.util.concurrent.FutureTask.run(FutureTask.java:266) [2020-05-11 > 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at > java.lang.Thread.run(Thread.java:748) [2020-05-11 05:43:12,137] > {base_task_runner.py:95} INFO - Subtask: Caused by: java.lang.AssertionError: > Capacity must be a power of two [2020-05-11 05:43:12,137] > {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.validateCapacity(BytesBytesMultiHashMap.java:552) > [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashImpl(BytesBytesMultiHashMap.java:731) > [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at > org.apache.hadoop.hive.ql.exec.persistence.BytesByte
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457056&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457056 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 10/Jul/20 08:17 Start Date: 10/Jul/20 08:17 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r451668879 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/FileOperations.java ## @@ -165,4 +175,92 @@ private void validateSrcPathListExists() throws IOException, LoginException { throw new FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage())); } } + + /** + * This needs the root data directory to which the data needs to be exported to. + * The data export here is a list of files either in table/partition that are written to the _files + * in the exportRootDataDir provided. + */ + private void exportFilesAsList() throws SemanticException, IOException, LoginException { +if (dataPathList.isEmpty()) { Review comment: this check is added to different methods ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/InsertHandler.java ## @@ -76,16 +77,29 @@ public void handle(Context withinContext) throws Exception { withinContext.hiveConf); Iterable files = eventMessage.getFiles(); +boolean copyAtLoad = withinContext.hiveConf.getBoolVar(HiveConf.ConfVars.REPL_DATA_COPY_LAZY); + /* * Insert into/overwrite operation shall operate on one or more partitions or even partitions from multiple tables. * But, Insert event is generated for each partition to which the data is inserted. * So, qlPtns list will have only one entry. */ Partition ptn = (null == qlPtns || qlPtns.isEmpty()) ? null : qlPtns.get(0); if (files != null) { - // encoded filename/checksum of files, write into _files - for (String file : files) { -writeFileEntry(qlMdTable, ptn, file, withinContext); + if (copyAtLoad) { +// encoded filename/checksum of files, write into _files +Path dataPath = null; +if ((null == qlPtns) || qlPtns.isEmpty()) { + dataPath = new Path(withinContext.eventRoot, EximUtil.DATA_PATH_NAME); +} else { + dataPath = new Path(withinContext.eventRoot, EximUtil.DATA_PATH_NAME + File.separator Review comment: Use path constructor instead of appending with File separator ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java ## @@ -148,6 +148,13 @@ public static synchronized ReplChangeManager getInstance(Configuration conf) return instance; } + public static synchronized ReplChangeManager getInstance() { Review comment: why do you need this? ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/FileOperations.java ## @@ -165,4 +175,92 @@ private void validateSrcPathListExists() throws IOException, LoginException { throw new FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage())); } } + + /** + * This needs the root data directory to which the data needs to be exported to. + * The data export here is a list of files either in table/partition that are written to the _files + * in the exportRootDataDir provided. + */ + private void exportFilesAsList() throws SemanticException, IOException, LoginException { +if (dataPathList.isEmpty()) { + return; +} +boolean done = false; +int repeat = 0; +while (!done) { Review comment: use existing retry interface ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/FileOperations.java ## @@ -165,4 +175,92 @@ private void validateSrcPathListExists() throws IOException, LoginException { throw new FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage())); } } + + /** + * This needs the root data directory to which the data needs to be exported to. + * The data export here is a list of files either in table/partition that are written to the _files + * in the exportRootDataDir provided. + */ + private void exportFilesAsList() throws SemanticException, IOException, LoginException { +if (dataPathList.isEmpty()) { + return; +} +boolean done = false; +int repeat = 0; +while (!done) { + // This is only called for replication that handles MM tables; no need for mmCtx. + try (BufferedWriter writer = writer()) { +for (Path dataPath : dataPathList) { + writeFilesList(listFilesInDir(dataPath), writer, AcidUtils.getAcidSubDir(dataPath)); +} +done = true; + } catch (IOException e
[jira] [Work logged] (HIVE-22412) StatsUtils throw NPE when explain
[ https://issues.apache.org/jira/browse/HIVE-22412?focusedWorklogId=457055&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457055 ] ASF GitHub Bot logged work on HIVE-22412: - Author: ASF GitHub Bot Created on: 10/Jul/20 08:14 Start Date: 10/Jul/20 08:14 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #1209: URL: https://github.com/apache/hive/pull/1209#discussion_r452692916 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java ## @@ -1336,6 +1341,9 @@ public static long getSizeOfPrimitiveTypeArraysFromType(String colType, int leng */ public static long getSizeOfMap(StandardConstantMapObjectInspector scmoi) { Map map = scmoi.getWritableConstantValue(); +if (null == map || map.isEmpty()) { + return 0L; +} Review comment: @StefanXiepj could you update that if according @belugabehr 's proposal? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 457055) Time Spent: 2h 50m (was: 2h 40m) > StatsUtils throw NPE when explain > - > > Key: HIVE-22412 > URL: https://issues.apache.org/jira/browse/HIVE-22412 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.2.1, 2.0.0, 3.0.0 >Reporter: xiepengjie >Assignee: xiepengjie >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22412.patch > > Time Spent: 2h 50m > Remaining Estimate: 0h > > The demo like this: > {code:java} > drop table if exists explain_npe_map; > drop table if exists explain_npe_array; > drop table if exists explain_npe_struct; > create table explain_npe_map( c1 map ); > create table explain_npe_array ( c1 array ); > create table explain_npe_struct ( c1 struct ); > -- error > set hive.cbo.enable=false; > explain select c1 from explain_npe_map where c1 is null; > explain select c1 from explain_npe_array where c1 is null; > explain select c1 from explain_npe_struct where c1 is null; > -- correct > set hive.cbo.enable=true; > explain select c1 from explain_npe_map where c1 is null; > explain select c1 from explain_npe_array where c1 is null; > explain select c1 from explain_npe_struct where c1 is null;{code} > > if the conf 'hive.cbo.enable' set false , NPE will be thrown ; otherwise will > not. > {code:java} > hive> drop table if exists explain_npe_map; > OK > Time taken: 0.063 seconds > hive> drop table if exists explain_npe_array; > OK > Time taken: 0.035 seconds > hive> drop table if exists explain_npe_struct; > OK > Time taken: 0.015 seconds > hive> > > create table explain_npe_map( c1 map ); > OK > Time taken: 0.584 seconds > hive> create table explain_npe_array ( c1 array ); > OK > Time taken: 0.216 seconds > hive> create table explain_npe_struct ( c1 struct ); > OK > Time taken: 0.17 seconds > hive> > > set hive.cbo.enable=false; > hive> explain select c1 from explain_npe_map where c1 is null; > FAILED: NullPointerException null > hive> explain select c1 from explain_npe_array where c1 is null; > FAILED: NullPointerException null > hive> explain select c1 from explain_npe_struct where c1 is null; > FAILED: RuntimeException Error invoking signature method > hive> > > set hive.cbo.enable=true; > hive> explain select c1 from explain_npe_map where c1 is null; > OK > STAGE DEPENDENCIES: > Stage-0 is a root stageSTAGE PLANS: > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > TableScan > alias: explain_npe_map > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column > stats: NONE > Filter Operator > predicate: false (type: boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column > stats: NONE > Select Operator > expressions: c1 (type: map) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > ListSinkTime taken: 1.593 seconds, Fetched: 20 row(s) > hive> explain select c1 from explain_npe_array where c1 is null; > OK > STAGE DEPENDENCIES: > Stage-0 is a root stageSTAGE PLANS: > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > TableScan > alias: explain_npe_array > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column > stats:
[jira] [Updated] (HIVE-23833) wrong explain and result when full join with join
[ https://issues.apache.org/jira/browse/HIVE-23833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chuanjie.duan updated HIVE-23833: - Component/s: Hive Affects Version/s: 2.1.1 Description: Reproduce: # Create three tables, mytest_t1, mytest_t2, mytest_t4 # hive -e "explain select coalesce(t1.wh_guid,t2.wh_guid) as wh_guid from dw_dev.mytest_t1 t1 full join dw_dev.mytest_t2 t2 on t1.material_code = t2.material_code;" # hive -e "explain select coalesce(t1.wh_guid,t2.wh_guid) as wh_guid from dw_dev.mytest_t1 t1 full join dw_dev.mytest_t2 t2 on t1.material_code = t2.material_code {color:#FF}join dw_dev.mytest_t5 t5 on t5.material_code = coalesce(t1.material_code,t2.material_code){color};" 2 - explain Map Reduce Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: material_code (type: string), wh_guid (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: string) TableScan alias: t2 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: material_code (type: string), wh_guid (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: string) Reduce Operator Tree: Join Operator condition map: {color:#FF}Outer Join 0 to 1{color} keys: 0 _col0 (type: string) 1 _col0 (type: string) outputColumnNames: _col1, _col3 Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: COALESCE(_col1,_col3) (type: string) outputColumnNames: _col0 Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 3 - explain STAGE PLANS: Stage: Stage-7 Map Reduce Local Work Alias -> Map Local Tables: $hdt$_1:t2 Fetch Operator limit: -1 $hdt$_2:t5 Fetch Operator limit: -1 Alias -> Map Local Operator Tree: $hdt$_1:t2 TableScan alias: t2 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: material_code is not null (type: boolean) Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: material_code (type: string), wh_guid (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col0 (type: string) 1 _col0 (type: string) $hdt$_2:t5 TableScan alias: t5 Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: material_code is not null (type: boolean) Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: material_code (type: string) outputColumnNames: _col0 Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 COALESCE(_col0,_col2) (type: string) 1 _col0 (type: string) Stage: Stage-5 Map Reduce Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: material_code is not null (type: boolean) Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: material_code (type: string), wh_guid (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: {color:red} Inner Join 0 to 1{color} keys: 0 _col0 (type: string) 1 _col0 (type: string) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: {color:red} Inner Join 0 to 1{color} keys: 0 COALESCE(_co
[jira] [Updated] (HIVE-23832) Compaction cleaner fails to clean up deltas when using blocking compaction
[ https://issues.apache.org/jira/browse/HIVE-23832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko updated HIVE-23832: -- Description: {code} CREATE TABLE default.compcleanup ( cda_id int, cda_run_id varchar(255), cda_load_tstimestamp, global_party_idstring, group_id string) COMMENT 'gp_2_gr' PARTITIONED BY ( cda_date int, cda_job_name varchar(12)) STORED AS ORC; -- cda_date=20200601/cda_job_name=core_base INSERT INTO default.compcleanup VALUES (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base'); SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name = 'core_base'; UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1; SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name = 'core_base'; ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT; {code} When using blocking compaction Cleaner skips processing due to the presence of open txn (by `ALTER TABLE`) below Compactor's one. {code} AcidUtils - getChildState() ignoring([]) pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035 {code} AcidUtils.processBaseDir {code} if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, validTxnList)) { return; } {code} was: {code} CREATE TABLE default.compcleanup ( cda_id int, cda_run_id varchar(255), cda_load_tstimestamp, global_party_idstring, group_id string) COMMENT 'gp_2_gr' PARTITIONED BY ( cda_date int, cda_job_name varchar(12)) STORED AS ORC; -- cda_date=20200601/cda_job_name=core_base INSERT INTO default.compcleanup VALUES (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base'); SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name = 'core_base'; UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1; SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name = 'core_base'; ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT; {code} When using blocking compaction Cleaner skips processing due to the open txn by `ALTER TABLE` below Compactor's one. {code} AcidUtils - getChildState() ignoring([]) pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035 {code} AcidUtils.processBaseDir {code} if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, validTxnList)) { return; } {code} > Compaction cleaner fails to clean up deltas when using blocking compaction > -- > > Key: HIVE-23832 > URL: https://issues.apache.org/jira/browse/HIVE-23832 > Project: Hive > Issue Type: Bug >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > > {code} > CREATE TABLE default.compcleanup ( >cda_id int, >cda_run_id varchar(255), >cda_load_tstimestamp, >global_party_idstring, >group_id string) > COMMENT 'gp_2_gr' > PARTITIONED BY ( >cda_date int, >cda_job_name varchar(12)) > STORED AS ORC; > -- cda_date=20200601/cda_job_name=core_base > INSERT INTO default.compcleanup VALUES > (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base'); > SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name > = 'core_base'; > UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1; > SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name > = 'core_base'; > ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, > cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT; > {code} > When using blocking compaction Cleaner skips processing due to the presence > of open txn (by `ALTER TABLE`) below Compactor's one. > {code} > AcidUtils - getChildState() ignoring([]) > pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035 > {code} > AcidUtils.processBaseDir > {code} > if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, > validTxnList)) { >return; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23832) Compaction cleaner fails to clean up deltas when using blocking compaction
[ https://issues.apache.org/jira/browse/HIVE-23832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko updated HIVE-23832: -- Description: {code} CREATE TABLE default.compcleanup ( cda_id int, cda_run_id varchar(255), cda_load_tstimestamp, global_party_idstring, group_id string) COMMENT 'gp_2_gr' PARTITIONED BY ( cda_date int, cda_job_name varchar(12)) STORED AS ORC; -- cda_date=20200601/cda_job_name=core_base INSERT INTO default.compcleanup VALUES (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base'); SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name = 'core_base'; UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1; SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name = 'core_base'; ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT; {code} When using blocking compaction Cleaner skips processing due to the open txn by `ALTER TABLE` below Compactor's one. {code} AcidUtils - getChildState() ignoring([]) pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035 {code} {code} if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, validTxnList)) { return; } {code} was: {code} CREATE TABLE default.compcleanup ( cda_id int, cda_run_id varchar(255), cda_load_tstimestamp, global_party_idstring, group_id string) COMMENT 'gp_2_gr' PARTITIONED BY ( cda_date int, cda_job_name varchar(12)) STORED AS ORC; -- cda_date=20200601/cda_job_name=core_base INSERT INTO default.compcleanup VALUES (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base'); SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name = 'core_base'; UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1; SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name = 'core_base'; ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT; {code} When using blocking compaction Cleaner skips processing due to the open txn (by ALTER TABLE) below Compactor's one. {code} AcidUtils - getChildState() ignoring([]) pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035 {code} > Compaction cleaner fails to clean up deltas when using blocking compaction > -- > > Key: HIVE-23832 > URL: https://issues.apache.org/jira/browse/HIVE-23832 > Project: Hive > Issue Type: Bug >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > > {code} > CREATE TABLE default.compcleanup ( >cda_id int, >cda_run_id varchar(255), >cda_load_tstimestamp, >global_party_idstring, >group_id string) > COMMENT 'gp_2_gr' > PARTITIONED BY ( >cda_date int, >cda_job_name varchar(12)) > STORED AS ORC; > -- cda_date=20200601/cda_job_name=core_base > INSERT INTO default.compcleanup VALUES > (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base'); > SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name > = 'core_base'; > UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1; > SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name > = 'core_base'; > ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, > cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT; > {code} > When using blocking compaction Cleaner skips processing due to the open txn > by `ALTER TABLE` below Compactor's one. > {code} > AcidUtils - getChildState() ignoring([]) > pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035 > {code} > {code} > if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, > validTxnList)) { > return; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23832) Compaction cleaner fails to clean up deltas when using blocking compaction
[ https://issues.apache.org/jira/browse/HIVE-23832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko updated HIVE-23832: -- Description: {code} CREATE TABLE default.compcleanup ( cda_id int, cda_run_id varchar(255), cda_load_tstimestamp, global_party_idstring, group_id string) COMMENT 'gp_2_gr' PARTITIONED BY ( cda_date int, cda_job_name varchar(12)) STORED AS ORC; -- cda_date=20200601/cda_job_name=core_base INSERT INTO default.compcleanup VALUES (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base'); SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name = 'core_base'; UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1; SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name = 'core_base'; ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT; {code} When using blocking compaction Cleaner skips processing due to the open txn by `ALTER TABLE` below Compactor's one. {code} AcidUtils - getChildState() ignoring([]) pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035 {code} AcidUtils.processBaseDir {code} if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, validTxnList)) { return; } {code} was: {code} CREATE TABLE default.compcleanup ( cda_id int, cda_run_id varchar(255), cda_load_tstimestamp, global_party_idstring, group_id string) COMMENT 'gp_2_gr' PARTITIONED BY ( cda_date int, cda_job_name varchar(12)) STORED AS ORC; -- cda_date=20200601/cda_job_name=core_base INSERT INTO default.compcleanup VALUES (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base'); SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name = 'core_base'; UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1; SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name = 'core_base'; ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT; {code} When using blocking compaction Cleaner skips processing due to the open txn by `ALTER TABLE` below Compactor's one. {code} AcidUtils - getChildState() ignoring([]) pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035 {code} {code} if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, validTxnList)) { return; } {code} > Compaction cleaner fails to clean up deltas when using blocking compaction > -- > > Key: HIVE-23832 > URL: https://issues.apache.org/jira/browse/HIVE-23832 > Project: Hive > Issue Type: Bug >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > > {code} > CREATE TABLE default.compcleanup ( >cda_id int, >cda_run_id varchar(255), >cda_load_tstimestamp, >global_party_idstring, >group_id string) > COMMENT 'gp_2_gr' > PARTITIONED BY ( >cda_date int, >cda_job_name varchar(12)) > STORED AS ORC; > -- cda_date=20200601/cda_job_name=core_base > INSERT INTO default.compcleanup VALUES > (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base'); > SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name > = 'core_base'; > UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1; > SELECT * FROM default.compcleanup where cda_date = 20200601 and cda_job_name > = 'core_base'; > ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, > cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT; > {code} > When using blocking compaction Cleaner skips processing due to the open txn > by `ALTER TABLE` below Compactor's one. > {code} > AcidUtils - getChildState() ignoring([]) > pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035 > {code} > AcidUtils.processBaseDir > {code} > if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, > validTxnList)) { >return; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)