[jira] [Work logged] (HIVE-23730) Compiler support tracking TS keyColName for Probe MapJoin
[ https://issues.apache.org/jira/browse/HIVE-23730?focusedWorklogId=449644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449644 ] ASF GitHub Bot logged work on HIVE-23730: - Author: ASF GitHub Bot Created on: 23/Jun/20 05:28 Start Date: 23/Jun/20 05:28 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1152: URL: https://github.com/apache/hive/pull/1152#discussion_r443970247 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ## @@ -1566,13 +1569,38 @@ private void removeSemijoinsParallelToMapJoin(OptimizeTezProcContext procCtx) List keyDesc = selectedMJOp.getConf().getKeys().get(posBigTable); ExprNodeColumnDesc keyCol = (ExprNodeColumnDesc) keyDesc.get(0); - - tsProbeDecodeCtx = new TableScanOperator.ProbeDecodeContext(mjCacheKey, mjSmallTablePos, - keyCol.getColumn(), selectedMJOpRatio); + String realTSColName = getOriginalTSColName(selectedMJOp, keyCol.getColumn()); + if (realTSColName != null) { +tsProbeDecodeCtx = new TableScanOperator.ProbeDecodeContext(mjCacheKey, mjSmallTablePos, +realTSColName, selectedMJOpRatio); + } else { +LOG.warn("ProbeDecode could not find TSColName for ColKey {} with MJ Schema {} ", keyCol, selectedMJOp.getSchema()); Review comment: Cool! I think we should enable it by default indeed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 449644) Time Spent: 1h (was: 50m) > Compiler support tracking TS keyColName for Probe MapJoin > - > > Key: HIVE-23730 > URL: https://issues.apache.org/jira/browse/HIVE-23730 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Compiler needs to track the original TS key columnName used for MJ > probedecode. > Even though we know the MJ keyCol at compile time, this could be generated by > previous (parent) operators thus we dont always know the original TS column > it maps to. > To find the original columnMapping, we need to track the MJ keyCol through > the operator pipeline. Tracking can be done through the parent operator > ColumnExprMap and RowSchema. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert
[ https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=449643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449643 ] ASF GitHub Bot logged work on HIVE-23725: - Author: ASF GitHub Bot Created on: 23/Jun/20 05:26 Start Date: 23/Jun/20 05:26 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1151: URL: https://github.com/apache/hive/pull/1151#discussion_r443969420 ## File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java ## @@ -675,50 +678,18 @@ private void runInternal(String command, boolean alreadyCompiled) throws Command try { if (!validTxnManager.isValidTxnListState()) { - LOG.info("Compiling after acquiring locks"); + LOG.info("Reexecuting after acquiring locks, since snapshot was outdated."); // Snapshot was outdated when locks were acquired, hence regenerate context, - // txn list and retry - // TODO: Lock acquisition should be moved before analyze, this is a bit hackish. - // Currently, we acquire a snapshot, we compile the query wrt that snapshot, - // and then, we acquire locks. If snapshot is still valid, we continue as usual. - // But if snapshot is not valid, we recompile the query. - if (driverContext.isOutdatedTxn()) { -driverContext.getTxnManager().rollbackTxn(); - -String userFromUGI = DriverUtils.getUserFromUGI(driverContext); -driverContext.getTxnManager().openTxn(context, userFromUGI, driverContext.getTxnType()); -lockAndRespond(); - } - driverContext.setRetrial(true); - driverContext.getBackupContext().addSubContext(context); - driverContext.getBackupContext().setHiveLocks(context.getHiveLocks()); - context = driverContext.getBackupContext(); - driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY, -driverContext.getTxnManager().getValidTxns().toString()); - if (driverContext.getPlan().hasAcidResourcesInQuery()) { -validTxnManager.recordValidWriteIds(); - } - - if (!alreadyCompiled) { -// compile internal will automatically reset the perf logger -compileInternal(command, true); - } else { -// Since we're reusing the compiled plan, we need to update its start time for current run - driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime()); - } - - if (!validTxnManager.isValidTxnListState()) { -// Throw exception -throw handleHiveException(new HiveException("Operation could not be executed"), 14); + // txn list and retry (see ReExecutionRetryLockPlugin) + try { +releaseLocksAndCommitOrRollback(false); + } catch (LockException e) { +handleHiveException(e, 12); } - - //Reset the PerfLogger - perfLogger = SessionState.getPerfLogger(true); - - // the reason that we set the txn manager for the cxt here is because each - // query has its own ctx object. The txn mgr is shared across the - // same instance of Driver, which can run multiple queries. - context.setHiveTxnManager(driverContext.getTxnManager()); + throw handleHiveException( Review comment: > In the original logic, if another commit invalidated the snaphsot a second time, the query also failed with HiveExection. Iiuc that should not happen because we were holding the locks that we had already acquired; however, now we are releasing them. Hence, the logic is slightly different? In any case, it is straightforward to add a config property such as `HIVE_QUERY_MAX_REEXECUTION_COUNT` for this specific retry, then retrieve it in `shouldReExecute` method in `ReExecutionRetryLockPlugin`: You have both the number of retries as well as the conf (`getConf` method) to retrieve the max number of retries for the config. The check on `HIVE_QUERY_MAX_REEXECUTION_COUNT` for the rest of the plugins will need to be moved into `shouldReExecute` method in those plugins too (currently it is done within the `run` method in the `ReExecDriver` itself). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 449643) Time Spent: 1h 50m (was: 1h 40m) > ValidTxnManager snapshot outdating causing partial reads in merge insert >
[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert
[ https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=449642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449642 ] ASF GitHub Bot logged work on HIVE-23725: - Author: ASF GitHub Bot Created on: 23/Jun/20 05:26 Start Date: 23/Jun/20 05:26 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1151: URL: https://github.com/apache/hive/pull/1151#discussion_r443969420 ## File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java ## @@ -675,50 +678,18 @@ private void runInternal(String command, boolean alreadyCompiled) throws Command try { if (!validTxnManager.isValidTxnListState()) { - LOG.info("Compiling after acquiring locks"); + LOG.info("Reexecuting after acquiring locks, since snapshot was outdated."); // Snapshot was outdated when locks were acquired, hence regenerate context, - // txn list and retry - // TODO: Lock acquisition should be moved before analyze, this is a bit hackish. - // Currently, we acquire a snapshot, we compile the query wrt that snapshot, - // and then, we acquire locks. If snapshot is still valid, we continue as usual. - // But if snapshot is not valid, we recompile the query. - if (driverContext.isOutdatedTxn()) { -driverContext.getTxnManager().rollbackTxn(); - -String userFromUGI = DriverUtils.getUserFromUGI(driverContext); -driverContext.getTxnManager().openTxn(context, userFromUGI, driverContext.getTxnType()); -lockAndRespond(); - } - driverContext.setRetrial(true); - driverContext.getBackupContext().addSubContext(context); - driverContext.getBackupContext().setHiveLocks(context.getHiveLocks()); - context = driverContext.getBackupContext(); - driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY, -driverContext.getTxnManager().getValidTxns().toString()); - if (driverContext.getPlan().hasAcidResourcesInQuery()) { -validTxnManager.recordValidWriteIds(); - } - - if (!alreadyCompiled) { -// compile internal will automatically reset the perf logger -compileInternal(command, true); - } else { -// Since we're reusing the compiled plan, we need to update its start time for current run - driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime()); - } - - if (!validTxnManager.isValidTxnListState()) { -// Throw exception -throw handleHiveException(new HiveException("Operation could not be executed"), 14); + // txn list and retry (see ReExecutionRetryLockPlugin) + try { +releaseLocksAndCommitOrRollback(false); + } catch (LockException e) { +handleHiveException(e, 12); } - - //Reset the PerfLogger - perfLogger = SessionState.getPerfLogger(true); - - // the reason that we set the txn manager for the cxt here is because each - // query has its own ctx object. The txn mgr is shared across the - // same instance of Driver, which can run multiple queries. - context.setHiveTxnManager(driverContext.getTxnManager()); + throw handleHiveException( Review comment: ``` In the original logic, if another commit invalidated the snaphsot a second time, the query also failed with HiveExection. ``` Iiuc that should not happen because we were holding the locks that we had already acquired; however, now we are releasing them. Hence, the logic is slightly different? In any case, it is straightforward to add a config property such as `HIVE_QUERY_MAX_REEXECUTION_COUNT` for this specific retry, then retrieve it in `shouldReExecute` method in `ReExecutionRetryLockPlugin`: You have both the number of retries as well as the conf (`getConf` method) to retrieve the max number of retries for the config. The check on `HIVE_QUERY_MAX_REEXECUTION_COUNT` for the rest of the plugins will need to be moved into `shouldReExecute` method in those plugins too (currently it is done within the `run` method in the `ReExecDriver` itself). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 449642) Time Spent: 1h 40m (was: 1.5h) > ValidTxnManager snapshot outdating causing partial reads in merge insert >
[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert
[ https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=449641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449641 ] ASF GitHub Bot logged work on HIVE-23725: - Author: ASF GitHub Bot Created on: 23/Jun/20 05:25 Start Date: 23/Jun/20 05:25 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1151: URL: https://github.com/apache/hive/pull/1151#discussion_r443969420 ## File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java ## @@ -675,50 +678,18 @@ private void runInternal(String command, boolean alreadyCompiled) throws Command try { if (!validTxnManager.isValidTxnListState()) { - LOG.info("Compiling after acquiring locks"); + LOG.info("Reexecuting after acquiring locks, since snapshot was outdated."); // Snapshot was outdated when locks were acquired, hence regenerate context, - // txn list and retry - // TODO: Lock acquisition should be moved before analyze, this is a bit hackish. - // Currently, we acquire a snapshot, we compile the query wrt that snapshot, - // and then, we acquire locks. If snapshot is still valid, we continue as usual. - // But if snapshot is not valid, we recompile the query. - if (driverContext.isOutdatedTxn()) { -driverContext.getTxnManager().rollbackTxn(); - -String userFromUGI = DriverUtils.getUserFromUGI(driverContext); -driverContext.getTxnManager().openTxn(context, userFromUGI, driverContext.getTxnType()); -lockAndRespond(); - } - driverContext.setRetrial(true); - driverContext.getBackupContext().addSubContext(context); - driverContext.getBackupContext().setHiveLocks(context.getHiveLocks()); - context = driverContext.getBackupContext(); - driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY, -driverContext.getTxnManager().getValidTxns().toString()); - if (driverContext.getPlan().hasAcidResourcesInQuery()) { -validTxnManager.recordValidWriteIds(); - } - - if (!alreadyCompiled) { -// compile internal will automatically reset the perf logger -compileInternal(command, true); - } else { -// Since we're reusing the compiled plan, we need to update its start time for current run - driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime()); - } - - if (!validTxnManager.isValidTxnListState()) { -// Throw exception -throw handleHiveException(new HiveException("Operation could not be executed"), 14); + // txn list and retry (see ReExecutionRetryLockPlugin) + try { +releaseLocksAndCommitOrRollback(false); + } catch (LockException e) { +handleHiveException(e, 12); } - - //Reset the PerfLogger - perfLogger = SessionState.getPerfLogger(true); - - // the reason that we set the txn manager for the cxt here is because each - // query has its own ctx object. The txn mgr is shared across the - // same instance of Driver, which can run multiple queries. - context.setHiveTxnManager(driverContext.getTxnManager()); + throw handleHiveException( Review comment: {quote} In the original logic, if another commit invalidated the snaphsot a second time, the query also failed with HiveExection. {quote} Iiuc that should not happen because we were holding the locks that we had already acquired; however, now we are releasing them. Hence, the logic is slightly different? In any case, it is straightforward to add a config property such as `HIVE_QUERY_MAX_REEXECUTION_COUNT` for this specific retry, then retrieve it in `shouldReExecute` method in `ReExecutionRetryLockPlugin`: You have both the number of retries as well as the conf (`getConf` method) to retrieve the max number of retries for the config. The check on `HIVE_QUERY_MAX_REEXECUTION_COUNT` for the rest of the plugins will need to be moved into `shouldReExecute` method in those plugins too (currently it is done within the `run` method in the `ReExecDriver` itself). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 449641) Time Spent: 1.5h (was: 1h 20m) > ValidTxnManager snapshot outdating causing partial reads in merge insert >
[jira] [Updated] (HIVE-23668) Clean up Task for Hive Metrics
[ https://issues.apache.org/jira/browse/HIVE-23668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anishek Agarwal updated HIVE-23668: --- Resolution: Fixed Status: Resolved (was: Patch Available) +1 , Committed to master. > Clean up Task for Hive Metrics > -- > > Key: HIVE-23668 > URL: https://issues.apache.org/jira/browse/HIVE-23668 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23668.01.patch, HIVE-23668.02.patch, > HIVE-23668.03.patch, HIVE-23668.04.patch, HIVE-23668.05.patch, > HIVE-23668.06.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix
[ https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wanguangping updated HIVE-23748: Description: h1. background * SQL on TEZ * it's a Occasional problem h1. hiveserver2 log SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Connecting to jdbc:hive2://xxx:1/prod Connected to: Apache Hive (version 3.1.0.3.1.4.0-315) Driver: Hive JDBC (version 3.1.0.3.1.4.0-315) Transaction isolation: TRANSACTION_REPEATABLE_READ No rows affected (0.04 seconds) No rows affected (0.004 seconds) No rows affected (0.003 seconds) No rows affected (0.004 seconds) No rows affected (0.003 seconds) No rows affected (0.004 seconds) No rows affected (0.003 seconds) No rows affected (0.003 seconds) No rows affected (0.004 seconds) INFO : Compiling command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): use prod INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); Time taken: 0.887 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): use prod INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); Time taken: 0.197 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager No rows affected (1.096 seconds) No rows affected (0.004 seconds) INFO : Compiling command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): drop table if exists temp.shawnlee_newbase_devicebase INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); Time taken: 1.324 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): drop table if exists temp.shawnlee_newbase_devicebase INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); Time taken: 12.895 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager No rows affected (14.229 seconds) INFO : Compiling command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): x INFO : Concurrency mode is disabled, not creating a lock manager INFO : No Stats for user_profile@dw_uba_event_daily, Columns: attribute, event INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:day, type:string, comment:null), FieldSchema(name:device_id, type:string, comment:null), FieldSchema(name:is_new, type:int, comment:null), FieldSchema(name:first_attribute, type:map, comment:null), FieldSchema(name:first_app_version, type:string, comment:null), FieldSchema(name:first_platform_type, type:string, comment:null), FieldSchema(name:first_manufacturer, type:string, comment:null), FieldSchema(name:first_model, type:string, comment:null), FieldSchema(name:first_ipprovince, type:string, comment:null), FieldSchema(name:first_ipcity, type:string, comment:null), FieldSchema(name:last_attribute, type:map, comment:null), FieldSchema(name:last_app_version, type:string, comment:null), FieldSchema(name:last_platform_type, type:string, comment:null), FieldSchema(name:last_manufacturer, type:string, comment:null), FieldSchema(name:last_model, type:string, comment:null), FieldSchema(name:last_ipprovince, type:string, comment:null), FieldSchema(name:last_ipcity, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f); Time taken: 78.517 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): INFO : Query ID = hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f INFO : Total jobs = 3 INFO
[jira] [Updated] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix
[ https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wanguangping updated HIVE-23748: Description: h1. background * SQL on TEZ * it's a Occasional problem h1. hiveserver2 log SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Connecting to jdbc:hive2://xxx:1/prod Connected to: Apache Hive (version 3.1.0.3.1.4.0-315) Driver: Hive JDBC (version 3.1.0.3.1.4.0-315) Transaction isolation: TRANSACTION_REPEATABLE_READ INFO : Compiling command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): use prod INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); Time taken: 0.887 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): use prod INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); Time taken: 0.197 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager No rows affected (1.096 seconds) No rows affected (0.004 seconds) INFO : Compiling command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): drop table if exists temp.shawnlee_newbase_devicebase INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); Time taken: 1.324 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): drop table if exists temp.shawnlee_newbase_devicebase INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); Time taken: 12.895 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager No rows affected (14.229 seconds) INFO : Compiling command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): x INFO : Concurrency mode is disabled, not creating a lock manager INFO : No Stats for user_profile@dw_uba_event_daily, Columns: attribute, event INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:day, type:string, comment:null), FieldSchema(name:device_id, type:string, comment:null), FieldSchema(name:is_new, type:int, comment:null), FieldSchema(name:first_attribute, type:map, comment:null), FieldSchema(name:first_app_version, type:string, comment:null), FieldSchema(name:first_platform_type, type:string, comment:null), FieldSchema(name:first_manufacturer, type:string, comment:null), FieldSchema(name:first_model, type:string, comment:null), FieldSchema(name:first_ipprovince, type:string, comment:null), FieldSchema(name:first_ipcity, type:string, comment:null), FieldSchema(name:last_attribute, type:map, comment:null), FieldSchema(name:last_app_version, type:string, comment:null), FieldSchema(name:last_platform_type, type:string, comment:null), FieldSchema(name:last_manufacturer, type:string, comment:null), FieldSchema(name:last_model, type:string, comment:null), FieldSchema(name:last_ipprovince, type:string, comment:null), FieldSchema(name:last_ipcity, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f); Time taken: 78.517 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): INFO : Query ID = hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f INFO : Total jobs = 3 INFO : Launching Job 1 out of 3 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: INFO : Status: Running
[jira] [Updated] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix
[ https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wanguangping updated HIVE-23748: Description: h1. background * SQL on TEZ * it's a Occasional problem h1. hiveserver2 log SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Connecting to jdbc:hive2://xxx:1/prod Connected to: Apache Hive (version 3.1.0.3.1.4.0-315) Driver: Hive JDBC (version 3.1.0.3.1.4.0-315) Transaction isolation: TRANSACTION_REPEATABLE_READ No rows affected (0.04 seconds) No rows affected (0.004 seconds) No rows affected (0.003 seconds) No rows affected (0.004 seconds) No rows affected (0.003 seconds) No rows affected (0.004 seconds) No rows affected (0.003 seconds) No rows affected (0.003 seconds) No rows affected (0.004 seconds) INFO : Compiling command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): use prod INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); Time taken: 0.887 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): use prod INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); Time taken: 0.197 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager No rows affected (1.096 seconds) No rows affected (0.004 seconds) INFO : Compiling command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): drop table if exists temp.shawnlee_newbase_devicebase INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); Time taken: 1.324 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): drop table if exists temp.shawnlee_newbase_devicebase INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); Time taken: 12.895 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager No rows affected (14.229 seconds) INFO : Compiling command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): x INFO : Concurrency mode is disabled, not creating a lock manager INFO : No Stats for user_profile@dw_uba_event_daily, Columns: attribute, event INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:day, type:string, comment:null), FieldSchema(name:device_id, type:string, comment:null), FieldSchema(name:is_new, type:int, comment:null), FieldSchema(name:first_attribute, type:map, comment:null), FieldSchema(name:first_app_version, type:string, comment:null), FieldSchema(name:first_platform_type, type:string, comment:null), FieldSchema(name:first_manufacturer, type:string, comment:null), FieldSchema(name:first_model, type:string, comment:null), FieldSchema(name:first_ipprovince, type:string, comment:null), FieldSchema(name:first_ipcity, type:string, comment:null), FieldSchema(name:last_attribute, type:map, comment:null), FieldSchema(name:last_app_version, type:string, comment:null), FieldSchema(name:last_platform_type, type:string, comment:null), FieldSchema(name:last_manufacturer, type:string, comment:null), FieldSchema(name:last_model, type:string, comment:null), FieldSchema(name:last_ipprovince, type:string, comment:null), FieldSchema(name:last_ipcity, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f); Time taken: 78.517 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): INFO : Query ID = hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f INFO : Total jobs = 3 INFO
[jira] [Updated] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix
[ https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wanguangping updated HIVE-23748: Description: h1. background * SQL on TEZ * it's a Occasional problem h1. hiveserver2 log flume failed to collect logs ,collect local log files now. *** *** TIME 1 RUN* *** *** 2020-06-09 03:33:11 INFO log dir:/home/data/wwwuser/falcon-runner/logs 2020-06-09 03:33:11 INFO sql usage1: comments 2020-06-09 03:33:11 INFO set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.auto.convert.join=true; set mapred.job.queue.name=product; set tez.queue.name=product; set tez.queue.name=product; set mapred.job.name=hivetask_smile.wang_994328; set hive.task.name=hivetask_smile.wang_994328; set mapred.job.priority=NORMAL; use prod; --SQL is written below-- set tez.queue.name=superman; drop table if exists temp.shawnlee_newbase_devicebase; create table if not exists temp.shawnlee_newbase_devicebase as with is_first_day as ( select day, attribute['\$device_id'] device_id from user_profile.dw_uba_event_daily where day >= date_sub(current_date(),20) and day <=date_sub(current_date(),1) and event !='\$AppStartPassively' and attribute['\$is_first_day'] = 1 and attribute['platform_type'] in ('Android','iOS') and project = 'default' and attribute['\$device_id'] is not null group by day,attribute['\$device_id'] ), last_msg as ( select a.day, a.device_id, attribute as last_attribute from (select day, attribute['time'] ts, attribute['\$device_id'] device_id, attribute, row_number() over(partition by day,attribute['\$device_id'] order by attribute['time'] desc) row_number from user_profile.dw_uba_event_daily where day >= date_sub(current_date(),20) and day <=date_sub(current_date(),1) and event !='\$AppStartPassively' and attribute['platform_type'] in ('Android','iOS') and attribute['\$device_id'] is not null and project = 'default') a where a.row_number=1 ), first_msg as ( select a.day, a.device_id, attribute as first_attribute from (select day, attribute['time'] ts, attribute['\$device_id'] device_id, attribute, row_number() over(partition by day,attribute['\$device_id'] order by attribute['time'] ) row_number from user_profile.dw_uba_event_daily where day >= date_sub(current_date(),20) and day <=date_sub(current_date(),1) and event !='\$AppStartPassively' and attribute['platform_type'] in ('Android','iOS') and attribute['\$device_id'] is not null and project = 'default') a where a.row_number=1 ), rihuo as ( select day, attribute['\$device_id'] device_id from user_profile.dw_uba_event_daily where day >= date_sub(current_date(),20) and day <=date_sub(current_date(),1) and event !='\$AppStartPassively' and attribute['platform_type'] in ('Android','iOS') and project = 'default' and attribute['\$device_id'] is not null group by day,attribute['\$device_id'] ) select a.day, a.device_id, (case when b.device_id is not null then 1 else 0 end) is_new, c.first_attribute, c.first_attribute['\$app_version'] as first_app_version,c.first_attribute['platform_type'] as first_platform_type, c.first_attribute['\$manufacturer'] as first_manufacturer,c.first_attribute['\$model'] as first_model, c.first_attribute['\$province'] as first_ipprovince,c.first_attribute['\$city'] as first_ipcity, d.last_attribute, d.last_attribute['\$app_version'] as last_app_version,d.last_attribute['platform_type'] as last_platform_type, d.last_attribute['\$manufacturer'] as last_manufacturer,d.last_attribute['\$model'] as last_model, d.last_attribute['\$province'] as last_ipprovince,d.last_attribute['\$city'] as last_ipcity from rihuo a left join is_first_day b on a.day=b.day and a.device_id=b.device_id left join first_msg c on a.day=c.day and a.device_id=c.device_id left join last_msg d on a.day=d.day and a.device_id=d.device_id ; insert overwrite table user_devicebase_shawnlee_app PARTITION (day) select device_id, is_new, first_attribute, first_app_version, first_platform_type, first_manufacturer, first_model, first_ipprovince, first_ipcity, last_attribute, last_app_version, last_platform_type, last_manufacturer, last_model, last_ipprovince, last_ipcity, day from temp.shawnlee_newbase_devicebase ; --SQL is written above-- " SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See
[jira] [Updated] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix
[ https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wanguangping updated HIVE-23748: Description: h1. background * SQL on TEZ * it's a Occasional problem h1. hiveserver2 log was: h1. [^hiveserver2 log.txt]background * SQL on TEZ * it's a Occasional problem h1. hiveserver2 log [^hiveserver2 log.txt] > tez task with File Merge operator generate tmp file with wrong suffix > - > > Key: HIVE-23748 > URL: https://issues.apache.org/jira/browse/HIVE-23748 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 3.1.0 >Reporter: wanguangping >Priority: Major > > h1. background > * SQL on TEZ > * it's a Occasional problem > h1. hiveserver2 log > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix
[ https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wanguangping updated HIVE-23748: Description: h1. [^hiveserver2 log.txt]background * SQL on TEZ * it's a Occasional problem h1. hiveserver2 log [^hiveserver2 log.txt] was: h1. background * SQL on TEZ * it's a Occasional problem h1. hiveserver2 log [^hiveserver2 log.txt] > tez task with File Merge operator generate tmp file with wrong suffix > - > > Key: HIVE-23748 > URL: https://issues.apache.org/jira/browse/HIVE-23748 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 3.1.0 >Reporter: wanguangping >Priority: Major > > h1. [^hiveserver2 log.txt]background > * SQL on TEZ > * it's a Occasional problem > h1. hiveserver2 log > [^hiveserver2 log.txt] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23735) Reducer misestimate for export command
[ https://issues.apache.org/jira/browse/HIVE-23735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23735: -- Labels: pull-request-available (was: ) > Reducer misestimate for export command > -- > > Key: HIVE-23735 > URL: https://issues.apache.org/jira/browse/HIVE-23735 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23735.1.wip.patch > > Time Spent: 10m > Remaining Estimate: 0h > > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L6869 > {code} > if (dest_tab.getNumBuckets() > 0) { > ... > } > {code} > For "export" command, HS2 creates a dummy table and for this table and gets > "1" as the number of buckets. > {noformat} > set hive.stats.autogather=false; > export table sample_table to '/tmp/export/sampe_db/t1'; > {noformat} > This causes issues in reducer estimates and always lands up with '1' as the > number of reducer task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23735) Reducer misestimate for export command
[ https://issues.apache.org/jira/browse/HIVE-23735?focusedWorklogId=449577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449577 ] ASF GitHub Bot logged work on HIVE-23735: - Author: ASF GitHub Bot Created on: 23/Jun/20 01:07 Start Date: 23/Jun/20 01:07 Worklog Time Spent: 10m Work Description: rbalamohan opened a new pull request #1165: URL: https://github.com/apache/hive/pull/1165 HIVE-23735: Reducer misestimate for export command SemanticAnalyzer::genBucketingSortingDest checks for number of buckets for enforceBucketing and based on this number of reducers are determined. Patch adds one more check for bucketCols to ensure that only valid tables gets the reducer sink. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 449577) Remaining Estimate: 0h Time Spent: 10m > Reducer misestimate for export command > -- > > Key: HIVE-23735 > URL: https://issues.apache.org/jira/browse/HIVE-23735 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > Attachments: HIVE-23735.1.wip.patch > > Time Spent: 10m > Remaining Estimate: 0h > > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L6869 > {code} > if (dest_tab.getNumBuckets() > 0) { > ... > } > {code} > For "export" command, HS2 creates a dummy table and for this table and gets > "1" as the number of buckets. > {noformat} > set hive.stats.autogather=false; > export table sample_table to '/tmp/export/sampe_db/t1'; > {noformat} > This causes issues in reducer estimates and always lands up with '1' as the > number of reducer task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23735) Reducer misestimate for export command
[ https://issues.apache.org/jira/browse/HIVE-23735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-23735: Status: Open (was: Patch Available) > Reducer misestimate for export command > -- > > Key: HIVE-23735 > URL: https://issues.apache.org/jira/browse/HIVE-23735 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > Attachments: HIVE-23735.1.wip.patch > > > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L6869 > {code} > if (dest_tab.getNumBuckets() > 0) { > ... > } > {code} > For "export" command, HS2 creates a dummy table and for this table and gets > "1" as the number of buckets. > {noformat} > set hive.stats.autogather=false; > export table sample_table to '/tmp/export/sampe_db/t1'; > {noformat} > This causes issues in reducer estimates and always lands up with '1' as the > number of reducer task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23735) Reducer misestimate for export command
[ https://issues.apache.org/jira/browse/HIVE-23735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-23735: Status: Patch Available (was: Open) > Reducer misestimate for export command > -- > > Key: HIVE-23735 > URL: https://issues.apache.org/jira/browse/HIVE-23735 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > Attachments: HIVE-23735.1.wip.patch > > > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L6869 > {code} > if (dest_tab.getNumBuckets() > 0) { > ... > } > {code} > For "export" command, HS2 creates a dummy table and for this table and gets > "1" as the number of buckets. > {noformat} > set hive.stats.autogather=false; > export table sample_table to '/tmp/export/sampe_db/t1'; > {noformat} > This causes issues in reducer estimates and always lands up with '1' as the > number of reducer task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-23746) Send task attempts async from AM to daemons
[ https://issues.apache.org/jira/browse/HIVE-23746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-23746 started by Mustafa Iman. --- > Send task attempts async from AM to daemons > --- > > Key: HIVE-23746 > URL: https://issues.apache.org/jira/browse/HIVE-23746 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Mustafa Iman >Assignee: Mustafa Iman >Priority: Major > > LlapTaskCommunicator uses sync client to send task attempts. There are fixed > number of communication threads (10 by default). This causes unneccessary > delays when there are enough free execution slots in daemons but they do not > receive all the tasks because of this bottleneck. LlapTaskCommunicator can > use an async client to pass these tasks to daemons. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23747) Increase the number of parallel tasks sent to daemons from am
[ https://issues.apache.org/jira/browse/HIVE-23747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mustafa Iman reassigned HIVE-23747: --- > Increase the number of parallel tasks sent to daemons from am > - > > Key: HIVE-23747 > URL: https://issues.apache.org/jira/browse/HIVE-23747 > Project: Hive > Issue Type: Sub-task >Reporter: Mustafa Iman >Assignee: Mustafa Iman >Priority: Major > > The number of inflight tasks from AM to a single executor is hardcoded to 1 > currently([https://github.com/apache/hive/blob/master/llap-client/src/java/org/apache/hadoop/hive/llap/tez/LlapProtocolClientProxy.java#L57] > ). It does not make sense to increase this right now as communication > between am and daemons happen synchronously anyway. After resolving > https://issues.apache.org/jira/browse/HIVE-23746 this must be increased to at > least number of execution slots per daemon. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23746) Send task attempts async from AM to daemons
[ https://issues.apache.org/jira/browse/HIVE-23746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mustafa Iman reassigned HIVE-23746: --- > Send task attempts async from AM to daemons > --- > > Key: HIVE-23746 > URL: https://issues.apache.org/jira/browse/HIVE-23746 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Mustafa Iman >Assignee: Mustafa Iman >Priority: Major > > LlapTaskCommunicator uses sync client to send task attempts. There are fixed > number of communication threads (10 by default). This causes unneccessary > delays when there are enough free execution slots in daemons but they do not > receive all the tasks because of this bottleneck. LlapTaskCommunicator can > use an async client to pass these tasks to daemons. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23745) Avoid copying userpayload in task communicator
[ https://issues.apache.org/jira/browse/HIVE-23745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mustafa Iman reassigned HIVE-23745: --- > Avoid copying userpayload in task communicator > -- > > Key: HIVE-23745 > URL: https://issues.apache.org/jira/browse/HIVE-23745 > Project: Hive > Issue Type: Sub-task >Reporter: Mustafa Iman >Assignee: Mustafa Iman >Priority: Major > > [https://github.com/apache/hive/blob/master/llap-common/src/java/org/apache/hadoop/hive/llap/tez/Converters.java#L182] > I see this copy take a few milliseconds sometimes. Delay here adds up for > all tasks of a single vertex in LlapTaskCommunicator as it processes tasks > one by one. User payload never changes in this codepath. Copy is made because > of limitations of Protobuf library. Protobuf adds a UnsafeByteOperations > class that avoid copying of ByteBuffers in 3.1 version. This can be resolved > when Protobuf is upgraded. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23596) LLAP: Encode initial guaranteed task information in containerId
[ https://issues.apache.org/jira/browse/HIVE-23596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mustafa Iman updated HIVE-23596: Parent: HIVE-23744 Issue Type: Sub-task (was: Improvement) > LLAP: Encode initial guaranteed task information in containerId > --- > > Key: HIVE-23596 > URL: https://issues.apache.org/jira/browse/HIVE-23596 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Mustafa Iman >Assignee: Mustafa Iman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > We should avoid calling LlapTaskScheduler to get initial isguaranteed flag > for all the tasks. It causes arbitrary delays in sending tasks out. Since > communicator is a single thread, any blocking there delays all the tasks. > There are [https://jira.apache.org/jira/browse/TEZ-4192] and > [https://jira.apache.org/jira/browse/HIVE-23589] for a proper solution to > this. However, that requires a Tez release which seems far right now. We can > replace the current hack with another hack that does not require locking. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23744) Reduce query startup latency
[ https://issues.apache.org/jira/browse/HIVE-23744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mustafa Iman reassigned HIVE-23744: --- > Reduce query startup latency > > > Key: HIVE-23744 > URL: https://issues.apache.org/jira/browse/HIVE-23744 > Project: Hive > Issue Type: Task > Components: llap >Affects Versions: 4.0.0 >Reporter: Mustafa Iman >Assignee: Mustafa Iman >Priority: Major > Attachments: am_schedule_and_transmit.png, task_start.png > > > When I run queries with large number of tasks for a single vertex, I see a > significant delay before all tasks start execution in llap daemons. > Although llap daemons have the free capacity to run the tasks, it takes a > significant time to schedule all the tasks in AM and actually transmit them > to executors. > "am_schedule_and_transmit" shows scheduling of tasks of tpcds query 55. It > shows only the tasks scheduled for one of 10 llap daemons. The scheduler > works in a single thread, scheduling tasks one by one. A delay in scheduling > of one task, delays all the tasks. > !am_schedule_and_transmit.png|width=831,height=573! > > Another issue is that it takes long time to fill all the execution slots in > llap daemons even though they are all empty initially. This is caused by > LlapTaskCommunicator using a fixed number of threads (10 by default) to send > the tasks to daemons. Also this communication is synchronized so these > threads block communication staying idle. "task_start.png" shows running > tasks on an llap daemon that has 12 execution slots. By the time 12th task > starts running, more than 100ms already passes. That slot stays idle all this > time. > !task_start.png|width=1166,height=635! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23743) hive-druid-handler shaded jar doesn't include maven-artifact classes
[ https://issues.apache.org/jira/browse/HIVE-23743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hankó Gergely reassigned HIVE-23743: > hive-druid-handler shaded jar doesn't include maven-artifact classes > > > Key: HIVE-23743 > URL: https://issues.apache.org/jira/browse/HIVE-23743 > Project: Hive > Issue Type: Bug > Components: Druid integration >Reporter: Hankó Gergely >Assignee: Nishant Bangarwa >Priority: Major > > hive-druid-handler depends on druid-processing jar that depends on classes > from maven-artifact jar but these classes are not included in the shaded jar > so the following Exception may occur: > {code:java} > java.lang.ClassNotFoundException: > org.apache.maven.artifact.versioning.ArtifactVersion at > ... > org.apache.hive.druid.org.apache.druid.query.ordering.StringComparators.(StringComparators.java:44) > at > org.apache.hive.druid.org.apache.druid.query.ordering.StringComparator.fromString(StringComparator.java:35) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23640) Fix FindBug issues in hive-druid-handler
[ https://issues.apache.org/jira/browse/HIVE-23640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis reassigned HIVE-23640: - Assignee: Panagiotis Garefalakis > Fix FindBug issues in hive-druid-handler > > > Key: HIVE-23640 > URL: https://issues.apache.org/jira/browse/HIVE-23640 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: spotbugsXml.xml > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23640) Fix FindBug issues in hive-druid-handler
[ https://issues.apache.org/jira/browse/HIVE-23640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23640: -- Labels: pull-request-available (was: ) > Fix FindBug issues in hive-druid-handler > > > Key: HIVE-23640 > URL: https://issues.apache.org/jira/browse/HIVE-23640 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Attachments: spotbugsXml.xml > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-23640) Fix FindBug issues in hive-druid-handler
[ https://issues.apache.org/jira/browse/HIVE-23640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-23640 started by Panagiotis Garefalakis. - > Fix FindBug issues in hive-druid-handler > > > Key: HIVE-23640 > URL: https://issues.apache.org/jira/browse/HIVE-23640 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: spotbugsXml.xml > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23640) Fix FindBug issues in hive-druid-handler
[ https://issues.apache.org/jira/browse/HIVE-23640?focusedWorklogId=449403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449403 ] ASF GitHub Bot logged work on HIVE-23640: - Author: ASF GitHub Bot Created on: 22/Jun/20 17:25 Start Date: 22/Jun/20 17:25 Worklog Time Spent: 10m Work Description: pgaref opened a new pull request #1164: URL: https://github.com/apache/hive/pull/1164 Change-Id: I8a05baac6fd3b98eb513fd1cfa702409e052bc27 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 449403) Remaining Estimate: 0h Time Spent: 10m > Fix FindBug issues in hive-druid-handler > > > Key: HIVE-23640 > URL: https://issues.apache.org/jira/browse/HIVE-23640 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: spotbugsXml.xml > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23668) Clean up Task for Hive Metrics
[ https://issues.apache.org/jira/browse/HIVE-23668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-23668: --- Attachment: HIVE-23668.06.patch Status: Patch Available (was: In Progress) > Clean up Task for Hive Metrics > -- > > Key: HIVE-23668 > URL: https://issues.apache.org/jira/browse/HIVE-23668 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23668.01.patch, HIVE-23668.02.patch, > HIVE-23668.03.patch, HIVE-23668.04.patch, HIVE-23668.05.patch, > HIVE-23668.06.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23668) Clean up Task for Hive Metrics
[ https://issues.apache.org/jira/browse/HIVE-23668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-23668: --- Status: In Progress (was: Patch Available) > Clean up Task for Hive Metrics > -- > > Key: HIVE-23668 > URL: https://issues.apache.org/jira/browse/HIVE-23668 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23668.01.patch, HIVE-23668.02.patch, > HIVE-23668.03.patch, HIVE-23668.04.patch, HIVE-23668.05.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23639) Fix FindBug issues in hive-contrib
[ https://issues.apache.org/jira/browse/HIVE-23639?focusedWorklogId=449396=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449396 ] ASF GitHub Bot logged work on HIVE-23639: - Author: ASF GitHub Bot Created on: 22/Jun/20 16:58 Start Date: 22/Jun/20 16:58 Worklog Time Spent: 10m Work Description: pgaref opened a new pull request #1163: URL: https://github.com/apache/hive/pull/1163 Change-Id: I39afabc24bd9f2a8fca6c2a872069005356688f2 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 449396) Remaining Estimate: 0h Time Spent: 10m > Fix FindBug issues in hive-contrib > -- > > Key: HIVE-23639 > URL: https://issues.apache.org/jira/browse/HIVE-23639 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Priority: Major > Attachments: spotbugsXml.xml > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23639) Fix FindBug issues in hive-contrib
[ https://issues.apache.org/jira/browse/HIVE-23639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis reassigned HIVE-23639: - Assignee: Panagiotis Garefalakis > Fix FindBug issues in hive-contrib > -- > > Key: HIVE-23639 > URL: https://issues.apache.org/jira/browse/HIVE-23639 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Attachments: spotbugsXml.xml > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23639) Fix FindBug issues in hive-contrib
[ https://issues.apache.org/jira/browse/HIVE-23639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23639: -- Labels: pull-request-available (was: ) > Fix FindBug issues in hive-contrib > -- > > Key: HIVE-23639 > URL: https://issues.apache.org/jira/browse/HIVE-23639 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Attachments: spotbugsXml.xml > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-23639) Fix FindBug issues in hive-contrib
[ https://issues.apache.org/jira/browse/HIVE-23639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-23639 started by Panagiotis Garefalakis. - > Fix FindBug issues in hive-contrib > -- > > Key: HIVE-23639 > URL: https://issues.apache.org/jira/browse/HIVE-23639 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Attachments: spotbugsXml.xml > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23637) Fix FindBug issues in hive-cli
[ https://issues.apache.org/jira/browse/HIVE-23637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis reassigned HIVE-23637: - Assignee: Panagiotis Garefalakis > Fix FindBug issues in hive-cli > -- > > Key: HIVE-23637 > URL: https://issues.apache.org/jira/browse/HIVE-23637 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: spotbugsXml.xml > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-23637) Fix FindBug issues in hive-cli
[ https://issues.apache.org/jira/browse/HIVE-23637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-23637 started by Panagiotis Garefalakis. - > Fix FindBug issues in hive-cli > -- > > Key: HIVE-23637 > URL: https://issues.apache.org/jira/browse/HIVE-23637 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Attachments: spotbugsXml.xml > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23637) Fix FindBug issues in hive-cli
[ https://issues.apache.org/jira/browse/HIVE-23637?focusedWorklogId=449377=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449377 ] ASF GitHub Bot logged work on HIVE-23637: - Author: ASF GitHub Bot Created on: 22/Jun/20 16:30 Start Date: 22/Jun/20 16:30 Worklog Time Spent: 10m Work Description: pgaref opened a new pull request #1162: URL: https://github.com/apache/hive/pull/1162 Change-Id: I93fa0c8713950e493a3a212511bb86566bb53c46 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 449377) Remaining Estimate: 0h Time Spent: 10m > Fix FindBug issues in hive-cli > -- > > Key: HIVE-23637 > URL: https://issues.apache.org/jira/browse/HIVE-23637 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Priority: Major > Attachments: spotbugsXml.xml > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23637) Fix FindBug issues in hive-cli
[ https://issues.apache.org/jira/browse/HIVE-23637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23637: -- Labels: pull-request-available (was: ) > Fix FindBug issues in hive-cli > -- > > Key: HIVE-23637 > URL: https://issues.apache.org/jira/browse/HIVE-23637 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Attachments: spotbugsXml.xml > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete
[ https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142241#comment-17142241 ] Prasanth Jayachandran commented on HIVE-23737: -- cc/ [~rajesh.balamohan] > LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's > dagDelete > --- > > Key: HIVE-23737 > URL: https://issues.apache.org/jira/browse/HIVE-23737 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > > LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez > have added support for dagDelete in custom shuffle handler (TEZ-3362) we > could re-use that feature in LLAP. > There are some added advantages of using Tez's dagDelete feature rather than > the current LLAP's dagDelete feature. > 1) We can easily extend this feature to accommodate the upcoming features > such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 > and TEZ-4129 > 2) It will be more easier to maintain this feature by separating it out from > the Hive's code path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23638) Fix FindBug issues in hive-common
[ https://issues.apache.org/jira/browse/HIVE-23638?focusedWorklogId=449336=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449336 ] ASF GitHub Bot logged work on HIVE-23638: - Author: ASF GitHub Bot Created on: 22/Jun/20 15:22 Start Date: 22/Jun/20 15:22 Worklog Time Spent: 10m Work Description: pgaref opened a new pull request #1161: URL: https://github.com/apache/hive/pull/1161 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 449336) Remaining Estimate: 0h Time Spent: 10m > Fix FindBug issues in hive-common > - > > Key: HIVE-23638 > URL: https://issues.apache.org/jira/browse/HIVE-23638 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: spotbugsXml.xml > > Time Spent: 10m > Remaining Estimate: 0h > > mvn -Pspotbugs > -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO > -pl :hive-common test-compile > com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert
[ https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=449334=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449334 ] ASF GitHub Bot logged work on HIVE-23725: - Author: ASF GitHub Bot Created on: 22/Jun/20 15:22 Start Date: 22/Jun/20 15:22 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1151: URL: https://github.com/apache/hive/pull/1151#discussion_r443639126 ## File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java ## @@ -675,50 +678,18 @@ private void runInternal(String command, boolean alreadyCompiled) throws Command try { if (!validTxnManager.isValidTxnListState()) { - LOG.info("Compiling after acquiring locks"); + LOG.info("Reexecuting after acquiring locks, since snapshot was outdated."); // Snapshot was outdated when locks were acquired, hence regenerate context, - // txn list and retry - // TODO: Lock acquisition should be moved before analyze, this is a bit hackish. - // Currently, we acquire a snapshot, we compile the query wrt that snapshot, - // and then, we acquire locks. If snapshot is still valid, we continue as usual. - // But if snapshot is not valid, we recompile the query. - if (driverContext.isOutdatedTxn()) { -driverContext.getTxnManager().rollbackTxn(); - -String userFromUGI = DriverUtils.getUserFromUGI(driverContext); -driverContext.getTxnManager().openTxn(context, userFromUGI, driverContext.getTxnType()); -lockAndRespond(); - } - driverContext.setRetrial(true); - driverContext.getBackupContext().addSubContext(context); - driverContext.getBackupContext().setHiveLocks(context.getHiveLocks()); - context = driverContext.getBackupContext(); - driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY, -driverContext.getTxnManager().getValidTxns().toString()); - if (driverContext.getPlan().hasAcidResourcesInQuery()) { -validTxnManager.recordValidWriteIds(); - } - - if (!alreadyCompiled) { -// compile internal will automatically reset the perf logger -compileInternal(command, true); - } else { -// Since we're reusing the compiled plan, we need to update its start time for current run - driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime()); - } - - if (!validTxnManager.isValidTxnListState()) { -// Throw exception -throw handleHiveException(new HiveException("Operation could not be executed"), 14); + // txn list and retry (see ReExecutionRetryLockPlugin) + try { +releaseLocksAndCommitOrRollback(false); + } catch (LockException e) { +handleHiveException(e, 12); } - - //Reset the PerfLogger - perfLogger = SessionState.getPerfLogger(true); - - // the reason that we set the txn manager for the cxt here is because each - // query has its own ctx object. The txn mgr is shared across the - // same instance of Driver, which can run multiple queries. - context.setHiveTxnManager(driverContext.getTxnManager()); + throw handleHiveException( Review comment: This is an interesting question. In the original logic, if an other commit invalidated the snaphsot a second time, the query also failed with HiveExection. The main difference is, we do more work in this case (compile and acquire the locks again), so the chance is probably higher that the snapshot gets invalidated a second time, but I don't know if it is high enough that we should consider it. The ReexecDriver uses one global config for the number of retries, it would take some refactoring to make it independently configurable for the different plugins. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 449334) Time Spent: 1h 20m (was: 1h 10m) > ValidTxnManager snapshot outdating causing partial reads in merge insert > > > Key: HIVE-23725 > URL: https://issues.apache.org/jira/browse/HIVE-23725 > Project: Hive > Issue Type: Bug >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels:
[jira] [Updated] (HIVE-23638) Fix FindBug issues in hive-common
[ https://issues.apache.org/jira/browse/HIVE-23638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23638: -- Labels: pull-request-available (was: ) > Fix FindBug issues in hive-common > - > > Key: HIVE-23638 > URL: https://issues.apache.org/jira/browse/HIVE-23638 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Attachments: spotbugsXml.xml > > Time Spent: 10m > Remaining Estimate: 0h > > mvn -Pspotbugs > -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO > -pl :hive-common test-compile > com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23742) Remove unintentional execution of TPC-DS query39 in qtests
[ https://issues.apache.org/jira/browse/HIVE-23742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23742: -- Labels: pull-request-available (was: ) > Remove unintentional execution of TPC-DS query39 in qtests > -- > > Key: HIVE-23742 > URL: https://issues.apache.org/jira/browse/HIVE-23742 > Project: Hive > Issue Type: Task >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Trivial > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > TPC-DS queries under clientpositive/perf are meant only to check plan > regressions so they should never be really executed thus the execution part > should be removed from query39.q and cbo_query39.q -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23742) Remove unintentional execution of TPC-DS query39 in qtests
[ https://issues.apache.org/jira/browse/HIVE-23742?focusedWorklogId=449326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449326 ] ASF GitHub Bot logged work on HIVE-23742: - Author: ASF GitHub Bot Created on: 22/Jun/20 15:07 Start Date: 22/Jun/20 15:07 Worklog Time Spent: 10m Work Description: zabetak opened a new pull request #1160: URL: https://github.com/apache/hive/pull/1160 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 449326) Remaining Estimate: 0h Time Spent: 10m > Remove unintentional execution of TPC-DS query39 in qtests > -- > > Key: HIVE-23742 > URL: https://issues.apache.org/jira/browse/HIVE-23742 > Project: Hive > Issue Type: Task >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Trivial > Time Spent: 10m > Remaining Estimate: 0h > > TPC-DS queries under clientpositive/perf are meant only to check plan > regressions so they should never be really executed thus the execution part > should be removed from query39.q and cbo_query39.q -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23741) Store CacheTags in the file cache level
[ https://issues.apache.org/jira/browse/HIVE-23741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits updated HIVE-23741: --- Status: Patch Available (was: Open) > Store CacheTags in the file cache level > --- > > Key: HIVE-23741 > URL: https://issues.apache.org/jira/browse/HIVE-23741 > Project: Hive > Issue Type: Improvement >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > CacheTags are currently stored for every data buffer. The strings are > internalized, but the number of cache tag objects can be reduced by moving > them to the file cache level, and back referencing them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23730) Compiler support tracking TS keyColName for Probe MapJoin
[ https://issues.apache.org/jira/browse/HIVE-23730?focusedWorklogId=449314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449314 ] ASF GitHub Bot logged work on HIVE-23730: - Author: ASF GitHub Bot Created on: 22/Jun/20 14:51 Start Date: 22/Jun/20 14:51 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1152: URL: https://github.com/apache/hive/pull/1152#discussion_r443617251 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ## @@ -1566,13 +1569,38 @@ private void removeSemijoinsParallelToMapJoin(OptimizeTezProcContext procCtx) List keyDesc = selectedMJOp.getConf().getKeys().get(posBigTable); ExprNodeColumnDesc keyCol = (ExprNodeColumnDesc) keyDesc.get(0); - - tsProbeDecodeCtx = new TableScanOperator.ProbeDecodeContext(mjCacheKey, mjSmallTablePos, - keyCol.getColumn(), selectedMJOpRatio); + String realTSColName = getOriginalTSColName(selectedMJOp, keyCol.getColumn()); + if (realTSColName != null) { +tsProbeDecodeCtx = new TableScanOperator.ProbeDecodeContext(mjCacheKey, mjSmallTablePos, +realTSColName, selectedMJOpRatio); + } else { +LOG.warn("ProbeDecode could not find TSColName for ColKey {} with MJ Schema {} ", keyCol, selectedMJOp.getSchema()); Review comment: Qtest results here: http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1152/4/tests/ Seems that for for existing MJ ops the probedecode optimisation works fine (properly finds original TS col alias as well). Not sure if we want to enable probe by default however. Thoughts? cc @ashutoshc This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 449314) Time Spent: 50m (was: 40m) > Compiler support tracking TS keyColName for Probe MapJoin > - > > Key: HIVE-23730 > URL: https://issues.apache.org/jira/browse/HIVE-23730 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Compiler needs to track the original TS key columnName used for MJ > probedecode. > Even though we know the MJ keyCol at compile time, this could be generated by > previous (parent) operators thus we dont always know the original TS column > it maps to. > To find the original columnMapping, we need to track the MJ keyCol through > the operator pipeline. Tracking can be done through the parent operator > ColumnExprMap and RowSchema. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23742) Remove unintentional execution of TPC-DS query39 in qtests
[ https://issues.apache.org/jira/browse/HIVE-23742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis reassigned HIVE-23742: -- > Remove unintentional execution of TPC-DS query39 in qtests > -- > > Key: HIVE-23742 > URL: https://issues.apache.org/jira/browse/HIVE-23742 > Project: Hive > Issue Type: Task >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Trivial > > TPC-DS queries under clientpositive/perf are meant only to check plan > regressions so they should never be really executed thus the execution part > should be removed from query39.q and cbo_query39.q -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23741) Store CacheTags in the file cache level
[ https://issues.apache.org/jira/browse/HIVE-23741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23741: -- Labels: pull-request-available (was: ) > Store CacheTags in the file cache level > --- > > Key: HIVE-23741 > URL: https://issues.apache.org/jira/browse/HIVE-23741 > Project: Hive > Issue Type: Improvement >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > CacheTags are currently stored for every data buffer. The strings are > internalized, but the number of cache tag objects can be reduced by moving > them to the file cache level, and back referencing them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23741) Store CacheTags in the file cache level
[ https://issues.apache.org/jira/browse/HIVE-23741?focusedWorklogId=449299=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449299 ] ASF GitHub Bot logged work on HIVE-23741: - Author: ASF GitHub Bot Created on: 22/Jun/20 14:32 Start Date: 22/Jun/20 14:32 Worklog Time Spent: 10m Work Description: asinkovits opened a new pull request #1159: URL: https://github.com/apache/hive/pull/1159 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 449299) Remaining Estimate: 0h Time Spent: 10m > Store CacheTags in the file cache level > --- > > Key: HIVE-23741 > URL: https://issues.apache.org/jira/browse/HIVE-23741 > Project: Hive > Issue Type: Improvement >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > CacheTags are currently stored for every data buffer. The strings are > internalized, but the number of cache tag objects can be reduced by moving > them to the file cache level, and back referencing them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23741) Store CacheTags in the file cache level
[ https://issues.apache.org/jira/browse/HIVE-23741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits updated HIVE-23741: --- Attachment: (was: HIVE-23741.01.patch) > Store CacheTags in the file cache level > --- > > Key: HIVE-23741 > URL: https://issues.apache.org/jira/browse/HIVE-23741 > Project: Hive > Issue Type: Improvement >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > > CacheTags are currently stored for every data buffer. The strings are > internalized, but the number of cache tag objects can be reduced by moving > them to the file cache level, and back referencing them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23741) Store CacheTags in the file cache level
[ https://issues.apache.org/jira/browse/HIVE-23741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits updated HIVE-23741: --- Status: Open (was: Patch Available) > Store CacheTags in the file cache level > --- > > Key: HIVE-23741 > URL: https://issues.apache.org/jira/browse/HIVE-23741 > Project: Hive > Issue Type: Improvement >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > > CacheTags are currently stored for every data buffer. The strings are > internalized, but the number of cache tag objects can be reduced by moving > them to the file cache level, and back referencing them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23741) Store CacheTags in the file cache level
[ https://issues.apache.org/jira/browse/HIVE-23741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits updated HIVE-23741: --- Status: Patch Available (was: Open) > Store CacheTags in the file cache level > --- > > Key: HIVE-23741 > URL: https://issues.apache.org/jira/browse/HIVE-23741 > Project: Hive > Issue Type: Improvement >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Attachments: HIVE-23741.01.patch > > > CacheTags are currently stored for every data buffer. The strings are > internalized, but the number of cache tag objects can be reduced by moving > them to the file cache level, and back referencing them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23741) Store CacheTags in the file cache level
[ https://issues.apache.org/jira/browse/HIVE-23741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits updated HIVE-23741: --- Attachment: HIVE-23741.01.patch > Store CacheTags in the file cache level > --- > > Key: HIVE-23741 > URL: https://issues.apache.org/jira/browse/HIVE-23741 > Project: Hive > Issue Type: Improvement >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Attachments: HIVE-23741.01.patch > > > CacheTags are currently stored for every data buffer. The strings are > internalized, but the number of cache tag objects can be reduced by moving > them to the file cache level, and back referencing them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23741) Store CacheTags in the file cache level
[ https://issues.apache.org/jira/browse/HIVE-23741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits reassigned HIVE-23741: -- > Store CacheTags in the file cache level > --- > > Key: HIVE-23741 > URL: https://issues.apache.org/jira/browse/HIVE-23741 > Project: Hive > Issue Type: Improvement >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > > CacheTags are currently stored for every data buffer. The strings are > internalized, but the number of cache tag objects can be reduced by moving > them to the file cache level, and back referencing them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-20890) ACID: Allow whole table ReadLocks to skip all partition locks
[ https://issues.apache.org/jira/browse/HIVE-20890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142024#comment-17142024 ] Denys Kuzmenko commented on HIVE-20890: --- Hi [~pvary]. I can create pull request if necessary. In regards to 1st question, we are going to lock whole table if number of partitions we want to acquire lock on exceeds the threshold. In this case we are not locking partitions only table. > ACID: Allow whole table ReadLocks to skip all partition locks > - > > Key: HIVE-20890 > URL: https://issues.apache.org/jira/browse/HIVE-20890 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Gopal Vijayaraghavan >Assignee: Denys Kuzmenko >Priority: Major > Attachments: HIVE-20890.1.patch, HIVE-20890.2.patch, > HIVE-20890.3.patch, HIVE-20890.4.patch > > > HIVE-19369 proposes adding a EXCL_WRITE lock which does not wait for any > SHARED_READ locks for read operations - in the presence of that lock, the > insert overwrite no longer takes an exclusive lock. > The only exclusive operation will be a schema change or drop table, which > should take an exclusive lock on the entire table directly. > {code} > explain locks select * from tpcds_bin_partitioned_orc_1000.store_sales where > ss_sold_date_sk=2452626 > ++ > | Explain | > ++ > | LOCK INFORMATION: | > | tpcds_bin_partitioned_orc_1000.store_sales -> SHARED_READ | > | tpcds_bin_partitioned_orc_1000.store_sales.ss_sold_date_sk=2452626 -> > SHARED_READ | > ++ > {code} > So the per-partition SHARED_READ locks are no longer necessary, if the lock > builder already includes the table-wide SHARED_READ locks. > The removal of entire partitions is the only part which needs to be taken > care of within this semantics as row-removal instead of directory removal > (i.e "drop partition" -> "truncate partition" and have the truncation trigger > a whole directory cleaner, so that the partition disappears when there are 0 > rows left). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23730) Compiler support tracking TS keyColName for Probe MapJoin
[ https://issues.apache.org/jira/browse/HIVE-23730?focusedWorklogId=449159=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449159 ] ASF GitHub Bot logged work on HIVE-23730: - Author: ASF GitHub Bot Created on: 22/Jun/20 11:05 Start Date: 22/Jun/20 11:05 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1152: URL: https://github.com/apache/hive/pull/1152#discussion_r443482036 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ## @@ -1566,13 +1569,38 @@ private void removeSemijoinsParallelToMapJoin(OptimizeTezProcContext procCtx) List keyDesc = selectedMJOp.getConf().getKeys().get(posBigTable); ExprNodeColumnDesc keyCol = (ExprNodeColumnDesc) keyDesc.get(0); - - tsProbeDecodeCtx = new TableScanOperator.ProbeDecodeContext(mjCacheKey, mjSmallTablePos, - keyCol.getColumn(), selectedMJOpRatio); + String realTSColName = getOriginalTSColName(selectedMJOp, keyCol.getColumn()); + if (realTSColName != null) { +tsProbeDecodeCtx = new TableScanOperator.ProbeDecodeContext(mjCacheKey, mjSmallTablePos, +realTSColName, selectedMJOpRatio); + } else { +LOG.warn("ProbeDecode could not find TSColName for ColKey {} with MJ Schema {} ", keyCol, selectedMJOp.getSchema()); Review comment: Hey @jcamachor , thanks for the comments! The HIVE_IN_TEST trick could work as long we enable probedecode optimisation by default right? (currently if off) Just enabled the optimisation for this PR (throwing an exception instead of warn) to identify any existing issues. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 449159) Time Spent: 40m (was: 0.5h) > Compiler support tracking TS keyColName for Probe MapJoin > - > > Key: HIVE-23730 > URL: https://issues.apache.org/jira/browse/HIVE-23730 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Compiler needs to track the original TS key columnName used for MJ > probedecode. > Even though we know the MJ keyCol at compile time, this could be generated by > previous (parent) operators thus we dont always know the original TS column > it maps to. > To find the original columnMapping, we need to track the MJ keyCol through > the operator pipeline. Tracking can be done through the parent operator > ColumnExprMap and RowSchema. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23730) Compiler support tracking TS keyColName for Probe MapJoin
[ https://issues.apache.org/jira/browse/HIVE-23730?focusedWorklogId=449156=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449156 ] ASF GitHub Bot logged work on HIVE-23730: - Author: ASF GitHub Bot Created on: 22/Jun/20 11:00 Start Date: 22/Jun/20 11:00 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1152: URL: https://github.com/apache/hive/pull/1152#discussion_r443479460 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ## @@ -1566,13 +1569,38 @@ private void removeSemijoinsParallelToMapJoin(OptimizeTezProcContext procCtx) List keyDesc = selectedMJOp.getConf().getKeys().get(posBigTable); ExprNodeColumnDesc keyCol = (ExprNodeColumnDesc) keyDesc.get(0); - - tsProbeDecodeCtx = new TableScanOperator.ProbeDecodeContext(mjCacheKey, mjSmallTablePos, - keyCol.getColumn(), selectedMJOpRatio); + String realTSColName = getOriginalTSColName(selectedMJOp, keyCol.getColumn()); + if (realTSColName != null) { +tsProbeDecodeCtx = new TableScanOperator.ProbeDecodeContext(mjCacheKey, mjSmallTablePos, +realTSColName, selectedMJOpRatio); + } else { +LOG.warn("ProbeDecode could not find TSColName for ColKey {} with MJ Schema {} ", keyCol, selectedMJOp.getSchema()); + } } return tsProbeDecodeCtx; } + private static String getOriginalTSColName(MapJoinOperator mjOp, String internalCoName) { Review comment: Utility method is now moved to operatorUtils This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 449156) Time Spent: 0.5h (was: 20m) > Compiler support tracking TS keyColName for Probe MapJoin > - > > Key: HIVE-23730 > URL: https://issues.apache.org/jira/browse/HIVE-23730 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Compiler needs to track the original TS key columnName used for MJ > probedecode. > Even though we know the MJ keyCol at compile time, this could be generated by > previous (parent) operators thus we dont always know the original TS column > it maps to. > To find the original columnMapping, we need to track the MJ keyCol through > the operator pipeline. Tracking can be done through the parent operator > ColumnExprMap and RowSchema. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23706) Fix nulls first sorting behavior
[ https://issues.apache.org/jira/browse/HIVE-23706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa resolved HIVE-23706. --- Resolution: Fixed > Fix nulls first sorting behavior > > > Key: HIVE-23706 > URL: https://issues.apache.org/jira/browse/HIVE-23706 > Project: Hive > Issue Type: Bug > Components: Parser >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > {code} > INSERT INTO t(a) VALUES (1), (null), (3), (2), (2), (2); > select a from t order by a desc; > {code} > instead of > {code} > 3, 2, 2, 2, 1, null > {code} > should return > {code} > null, 3, 2 ,2 ,2, 1 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23706) Fix nulls first sorting behavior
[ https://issues.apache.org/jira/browse/HIVE-23706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141913#comment-17141913 ] Krisztian Kasa commented on HIVE-23706: --- Pushed to master. Thank you [~jcamachorodriguez] and [~zabetak] for review. > Fix nulls first sorting behavior > > > Key: HIVE-23706 > URL: https://issues.apache.org/jira/browse/HIVE-23706 > Project: Hive > Issue Type: Bug > Components: Parser >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > {code} > INSERT INTO t(a) VALUES (1), (null), (3), (2), (2), (2); > select a from t order by a desc; > {code} > instead of > {code} > 3, 2, 2, 2, 1, null > {code} > should return > {code} > null, 3, 2 ,2 ,2, 1 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23706) Fix nulls first sorting behavior
[ https://issues.apache.org/jira/browse/HIVE-23706?focusedWorklogId=449154=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449154 ] ASF GitHub Bot logged work on HIVE-23706: - Author: ASF GitHub Bot Created on: 22/Jun/20 10:51 Start Date: 22/Jun/20 10:51 Worklog Time Spent: 10m Work Description: kasakrisz merged pull request #1131: URL: https://github.com/apache/hive/pull/1131 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 449154) Time Spent: 1h (was: 50m) > Fix nulls first sorting behavior > > > Key: HIVE-23706 > URL: https://issues.apache.org/jira/browse/HIVE-23706 > Project: Hive > Issue Type: Bug > Components: Parser >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > {code} > INSERT INTO t(a) VALUES (1), (null), (3), (2), (2), (2); > select a from t order by a desc; > {code} > instead of > {code} > 3, 2, 2, 2, 1, null > {code} > should return > {code} > null, 3, 2 ,2 ,2, 1 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23738) DBLockManager::lock() : Move lock request to debug level
[ https://issues.apache.org/jira/browse/HIVE-23738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141902#comment-17141902 ] Peter Vary commented on HIVE-23738: --- Or partially. The lock components should be only in the debug logs, but it might be good to have info about the lock request starting > DBLockManager::lock() : Move lock request to debug level > > > Key: HIVE-23738 > URL: https://issues.apache.org/jira/browse/HIVE-23738 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Trivial > > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java#L102] > > For Q78 @30TB scale, it ends up dumping couple of MBs of log in info level to > print the lock request type. If possible, this should be moved to debug level. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23611) Mandate fully qualified absolute path for external table base dir during REPL operation
[ https://issues.apache.org/jira/browse/HIVE-23611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Sinha updated HIVE-23611: Attachment: HIVE-23611.02.patch > Mandate fully qualified absolute path for external table base dir during REPL > operation > --- > > Key: HIVE-23611 > URL: https://issues.apache.org/jira/browse/HIVE-23611 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23611.01.patch, HIVE-23611.02.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23703) Major QB compaction with multiple FileSinkOperators results in data loss and one original file
[ https://issues.apache.org/jira/browse/HIVE-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141858#comment-17141858 ] Marta Kuczora commented on HIVE-23703: -- +1 Thanks a lot [~klcopp] for the patch! > Major QB compaction with multiple FileSinkOperators results in data loss and > one original file > -- > > Key: HIVE-23703 > URL: https://issues.apache.org/jira/browse/HIVE-23703 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Critical > Labels: compaction, pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > h4. Problems > Example: > {code:java} > drop table if exists tbl2; > create transactional table tbl2 (a int, b int) clustered by (a) into 4 > buckets stored as ORC > TBLPROPERTIES('transactional'='true','transactional_properties'='default'); > insert into tbl2 values(1,2),(1,3),(1,4),(2,2),(2,3),(2,4); > insert into tbl2 values(3,2),(3,3),(3,4),(4,2),(4,3),(4,4); > insert into tbl2 values(5,2),(5,3),(5,4),(6,2),(6,3),(6,4);{code} > E.g. in the example above, bucketId=0 when a=2 and a=6. > 1. Data loss > In non-acid tables, an operator's temp files are named with their task id. > Because of this snippet, temp files in the FileSinkOperator for compaction > tables are identified by their bucket_id. > {code:java} > if (conf.isCompactionTable()) { > fsp.initializeBucketPaths(filesIdx, AcidUtils.BUCKET_PREFIX + > String.format(AcidUtils.BUCKET_DIGITS, bucketId), > isNativeTable(), isSkewedStoredAsSubDirectories); > } else { > fsp.initializeBucketPaths(filesIdx, taskId, isNativeTable(), > isSkewedStoredAsSubDirectories); > } > {code} > So 2 temp files containing data with a=2 and a=6 will be named bucket_0 and > not 00_0 and 00_1 as they would normally. > In FileSinkOperator.commit, when data with a=2, filename: bucket_0 is moved > from _task_tmp.-ext-10002 to _tmp.-ext-10002, it overwrites the files already > there with a=6 data, because it too is named bucket_0. You can see in the > logs: > {code:java} > WARN [LocalJobRunner Map Task Executor #0] exec.FileSinkOperator: Target > path > file:.../hive/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnNoBuckets-1591107230237/warehouse/testmajorcompaction/base_002_v013/.hive-staging_hive_2020-06-02_07-15-21_771_8551447285061957908-1/_tmp.-ext-10002/bucket_0 > with a size 610 exists. Trying to delete it. > {code} > 2. Results in one original file > OrcFileMergeOperator merges the results of the FSOp into 1 file named > 00_0. > h4. Fix > 1. FSOp will store data as: taskid/bucketId. e.g. 0_0/bucket_0 > 2. OrcMergeFileOp, instead of merging a bunch of files into 1 file named > 00_0, will merge all files named bucket_0 into one file named bucket_0, > and so on. > 3. MoveTask will get rid of the taskId directories if present and only move > the bucket files in them, in case OrcMergeFileOp is not run. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23668) Clean up Task for Hive Metrics
[ https://issues.apache.org/jira/browse/HIVE-23668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-23668: --- Attachment: HIVE-23668.05.patch Status: Patch Available (was: In Progress) > Clean up Task for Hive Metrics > -- > > Key: HIVE-23668 > URL: https://issues.apache.org/jira/browse/HIVE-23668 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23668.01.patch, HIVE-23668.02.patch, > HIVE-23668.03.patch, HIVE-23668.04.patch, HIVE-23668.05.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23668) Clean up Task for Hive Metrics
[ https://issues.apache.org/jira/browse/HIVE-23668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-23668: --- Status: In Progress (was: Patch Available) > Clean up Task for Hive Metrics > -- > > Key: HIVE-23668 > URL: https://issues.apache.org/jira/browse/HIVE-23668 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23668.01.patch, HIVE-23668.02.patch, > HIVE-23668.03.patch, HIVE-23668.04.patch, HIVE-23668.05.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete
[ https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated HIVE-23737: - Environment: (was: *strong text*) > LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's > dagDelete > --- > > Key: HIVE-23737 > URL: https://issues.apache.org/jira/browse/HIVE-23737 > Project: Hive > Issue Type: Improvement >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > > LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez > have added support for dagDelete in custom shuffle handler (TEZ-3362) we > could re-use that feature in LLAP. > There are some added advantages of using Tez's dagDelete feature rather than > the current LLAP's dagDelete feature. > 1) We can easily extend this feature to accommodate the upcoming features > such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 > and TEZ-4129 > 2) It will be more easier to maintain this feature by separating it out from > the Hive's code path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete
[ https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated HIVE-23737: - Component/s: llap > LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's > dagDelete > --- > > Key: HIVE-23737 > URL: https://issues.apache.org/jira/browse/HIVE-23737 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > > LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez > have added support for dagDelete in custom shuffle handler (TEZ-3362) we > could re-use that feature in LLAP. > There are some added advantages of using Tez's dagDelete feature rather than > the current LLAP's dagDelete feature. > 1) We can easily extend this feature to accommodate the upcoming features > such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 > and TEZ-4129 > 2) It will be more easier to maintain this feature by separating it out from > the Hive's code path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete
[ https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141814#comment-17141814 ] Syed Shameerur Rahman commented on HIVE-23737: -- [~gopalv] [~prasanth_j] Any thoughts on this ? > LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's > dagDelete > --- > > Key: HIVE-23737 > URL: https://issues.apache.org/jira/browse/HIVE-23737 > Project: Hive > Issue Type: Improvement > Environment: *strong text* >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > > LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez > have added support for dagDelete in custom shuffle handler (TEZ-3362) we > could re-use that feature in LLAP. > There are some added advantages of using Tez's dagDelete feature rather than > the current LLAP's dagDelete feature. > 1) We can easily extend this feature to accommodate the upcoming features > such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 > and TEZ-4129 > 2) It will be more easier to maintain this feature by separating it out from > the Hive's code path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete
[ https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman reassigned HIVE-23737: > LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's > dagDelete > --- > > Key: HIVE-23737 > URL: https://issues.apache.org/jira/browse/HIVE-23737 > Project: Hive > Issue Type: Improvement > Environment: *strong text* >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > > LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez > have added support for dagDelete in custom shuffle handler (TEZ-3362) we > could re-use that feature in LLAP. > There are some added advantages of using Tez's dagDelete feature rather than > the current LLAP's dagDelete feature. > 1) We can easily extend this feature to accommodate the upcoming features > such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 > and TEZ-4129 > 2) It will be more easier to maintain this feature by separating it out from > the Hive's code path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23736) Disable topn in ReduceSinkOp if a TNK is introduced
[ https://issues.apache.org/jira/browse/HIVE-23736?focusedWorklogId=449068=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449068 ] ASF GitHub Bot logged work on HIVE-23736: - Author: ASF GitHub Bot Created on: 22/Jun/20 08:15 Start Date: 22/Jun/20 08:15 Worklog Time Spent: 10m Work Description: kasakrisz opened a new pull request #1158: URL: https://github.com/apache/hive/pull/1158 Testing done: ``` mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=vector_topnkey.q,topnkey_grouping_sets_order.q,topnkey_windowing_order.q,topnkey_grouping_sets_functions.q,topnkey_order_null.q,topnkey_grouping_sets.q,topnkey_windowing.q -pl itests/qtest -Pitests ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 449068) Remaining Estimate: 0h Time Spent: 10m > Disable topn in ReduceSinkOp if a TNK is introduced > --- > > Key: HIVE-23736 > URL: https://issues.apache.org/jira/browse/HIVE-23736 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Both the Reduce Sink and TopNKey operator has Top n key filtering > functionality. If TNK is introduced this functionality is done twice. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23736) Disable topn in ReduceSinkOp if a TNK is introduced
[ https://issues.apache.org/jira/browse/HIVE-23736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23736: -- Labels: pull-request-available (was: ) > Disable topn in ReduceSinkOp if a TNK is introduced > --- > > Key: HIVE-23736 > URL: https://issues.apache.org/jira/browse/HIVE-23736 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Both the Reduce Sink and TopNKey operator has Top n key filtering > functionality. If TNK is introduced this functionality is done twice. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23736) Disable topn in ReduceSinkOp if a TNK is introduced
[ https://issues.apache.org/jira/browse/HIVE-23736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa reassigned HIVE-23736: - > Disable topn in ReduceSinkOp if a TNK is introduced > --- > > Key: HIVE-23736 > URL: https://issues.apache.org/jira/browse/HIVE-23736 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Minor > > Both the Reduce Sink and TopNKey operator has Top n key filtering > functionality. If TNK is introduced this functionality is done twice. -- This message was sent by Atlassian Jira (v8.3.4#803005)