[jira] [Work logged] (HIVE-23730) Compiler support tracking TS keyColName for Probe MapJoin

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23730?focusedWorklogId=449644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449644
 ]

ASF GitHub Bot logged work on HIVE-23730:
-

Author: ASF GitHub Bot
Created on: 23/Jun/20 05:28
Start Date: 23/Jun/20 05:28
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1152:
URL: https://github.com/apache/hive/pull/1152#discussion_r443970247



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
##
@@ -1566,13 +1569,38 @@ private void 
removeSemijoinsParallelToMapJoin(OptimizeTezProcContext procCtx)
 
   List keyDesc = 
selectedMJOp.getConf().getKeys().get(posBigTable);
   ExprNodeColumnDesc keyCol = (ExprNodeColumnDesc) keyDesc.get(0);
-
-  tsProbeDecodeCtx = new TableScanOperator.ProbeDecodeContext(mjCacheKey, 
mjSmallTablePos,
-  keyCol.getColumn(), selectedMJOpRatio);
+  String realTSColName = getOriginalTSColName(selectedMJOp, 
keyCol.getColumn());
+  if (realTSColName != null) {
+tsProbeDecodeCtx = new 
TableScanOperator.ProbeDecodeContext(mjCacheKey, mjSmallTablePos,
+realTSColName, selectedMJOpRatio);
+  } else {
+LOG.warn("ProbeDecode could not find TSColName for ColKey {} with MJ 
Schema {} ", keyCol, selectedMJOp.getSchema());

Review comment:
   Cool! I think we should enable it by default indeed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449644)
Time Spent: 1h  (was: 50m)

> Compiler support tracking TS keyColName for Probe MapJoin
> -
>
> Key: HIVE-23730
> URL: https://issues.apache.org/jira/browse/HIVE-23730
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Compiler needs to track the original TS key columnName used for MJ 
> probedecode.
> Even though we know the MJ keyCol at compile time, this could be generated by 
> previous (parent) operators thus we dont always know the original TS column 
> it maps to.
> To find the original columnMapping, we need to track the MJ keyCol through 
> the operator pipeline. Tracking can be done through the parent operator 
> ColumnExprMap and RowSchema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=449643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449643
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 23/Jun/20 05:26
Start Date: 23/Jun/20 05:26
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r443969420



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -675,50 +678,18 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
   try {
 if (!validTxnManager.isValidTxnListState()) {
-  LOG.info("Compiling after acquiring locks");
+  LOG.info("Reexecuting after acquiring locks, since snapshot was 
outdated.");
   // Snapshot was outdated when locks were acquired, hence regenerate 
context,
-  // txn list and retry
-  // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
-  // Currently, we acquire a snapshot, we compile the query wrt that 
snapshot,
-  // and then, we acquire locks. If snapshot is still valid, we 
continue as usual.
-  // But if snapshot is not valid, we recompile the query.
-  if (driverContext.isOutdatedTxn()) {
-driverContext.getTxnManager().rollbackTxn();
-
-String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
-driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
-lockAndRespond();
-  }
-  driverContext.setRetrial(true);
-  driverContext.getBackupContext().addSubContext(context);
-  
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
-  context = driverContext.getBackupContext();
-  driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
-driverContext.getTxnManager().getValidTxns().toString());
-  if (driverContext.getPlan().hasAcidResourcesInQuery()) {
-validTxnManager.recordValidWriteIds();
-  }
-
-  if (!alreadyCompiled) {
-// compile internal will automatically reset the perf logger
-compileInternal(command, true);
-  } else {
-// Since we're reusing the compiled plan, we need to update its 
start time for current run
-
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
-  }
-
-  if (!validTxnManager.isValidTxnListState()) {
-// Throw exception
-throw handleHiveException(new HiveException("Operation could not 
be executed"), 14);
+  // txn list and retry (see ReExecutionRetryLockPlugin)
+  try {
+releaseLocksAndCommitOrRollback(false);
+  } catch (LockException e) {
+handleHiveException(e, 12);
   }
-
-  //Reset the PerfLogger
-  perfLogger = SessionState.getPerfLogger(true);
-
-  // the reason that we set the txn manager for the cxt here is 
because each
-  // query has its own ctx object. The txn mgr is shared across the
-  // same instance of Driver, which can run multiple queries.
-  context.setHiveTxnManager(driverContext.getTxnManager());
+  throw handleHiveException(

Review comment:
   > In the original logic, if another commit invalidated the snaphsot a 
second time, the query also failed with HiveExection.
   
   Iiuc that should not happen because we were holding the locks that we had 
already acquired; however, now we are releasing them. Hence, the logic is 
slightly different?
   
   In any case, it is straightforward to add a config property such as 
`HIVE_QUERY_MAX_REEXECUTION_COUNT` for this specific retry, then retrieve it in 
`shouldReExecute` method in `ReExecutionRetryLockPlugin`: You have both the 
number of retries as well as the conf (`getConf` method) to retrieve the max 
number of retries for the config. The check on 
`HIVE_QUERY_MAX_REEXECUTION_COUNT` for the rest of the plugins will need to be 
moved into `shouldReExecute` method in those plugins too (currently it is done 
within the `run` method in the `ReExecDriver` itself).
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449643)
Time Spent: 1h 50m  (was: 1h 40m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=449642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449642
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 23/Jun/20 05:26
Start Date: 23/Jun/20 05:26
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r443969420



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -675,50 +678,18 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
   try {
 if (!validTxnManager.isValidTxnListState()) {
-  LOG.info("Compiling after acquiring locks");
+  LOG.info("Reexecuting after acquiring locks, since snapshot was 
outdated.");
   // Snapshot was outdated when locks were acquired, hence regenerate 
context,
-  // txn list and retry
-  // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
-  // Currently, we acquire a snapshot, we compile the query wrt that 
snapshot,
-  // and then, we acquire locks. If snapshot is still valid, we 
continue as usual.
-  // But if snapshot is not valid, we recompile the query.
-  if (driverContext.isOutdatedTxn()) {
-driverContext.getTxnManager().rollbackTxn();
-
-String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
-driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
-lockAndRespond();
-  }
-  driverContext.setRetrial(true);
-  driverContext.getBackupContext().addSubContext(context);
-  
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
-  context = driverContext.getBackupContext();
-  driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
-driverContext.getTxnManager().getValidTxns().toString());
-  if (driverContext.getPlan().hasAcidResourcesInQuery()) {
-validTxnManager.recordValidWriteIds();
-  }
-
-  if (!alreadyCompiled) {
-// compile internal will automatically reset the perf logger
-compileInternal(command, true);
-  } else {
-// Since we're reusing the compiled plan, we need to update its 
start time for current run
-
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
-  }
-
-  if (!validTxnManager.isValidTxnListState()) {
-// Throw exception
-throw handleHiveException(new HiveException("Operation could not 
be executed"), 14);
+  // txn list and retry (see ReExecutionRetryLockPlugin)
+  try {
+releaseLocksAndCommitOrRollback(false);
+  } catch (LockException e) {
+handleHiveException(e, 12);
   }
-
-  //Reset the PerfLogger
-  perfLogger = SessionState.getPerfLogger(true);
-
-  // the reason that we set the txn manager for the cxt here is 
because each
-  // query has its own ctx object. The txn mgr is shared across the
-  // same instance of Driver, which can run multiple queries.
-  context.setHiveTxnManager(driverContext.getTxnManager());
+  throw handleHiveException(

Review comment:
   ```
   In the original logic, if another commit invalidated the snaphsot a second 
time, the query also failed with HiveExection.
   ```
   Iiuc that should not happen because we were holding the locks that we had 
already acquired; however, now we are releasing them. Hence, the logic is 
slightly different?
   
   In any case, it is straightforward to add a config property such as 
`HIVE_QUERY_MAX_REEXECUTION_COUNT` for this specific retry, then retrieve it in 
`shouldReExecute` method in `ReExecutionRetryLockPlugin`: You have both the 
number of retries as well as the conf (`getConf` method) to retrieve the max 
number of retries for the config. The check on 
`HIVE_QUERY_MAX_REEXECUTION_COUNT` for the rest of the plugins will need to be 
moved into `shouldReExecute` method in those plugins too (currently it is done 
within the `run` method in the `ReExecDriver` itself).
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449642)
Time Spent: 1h 40m  (was: 1.5h)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=449641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449641
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 23/Jun/20 05:25
Start Date: 23/Jun/20 05:25
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r443969420



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -675,50 +678,18 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
   try {
 if (!validTxnManager.isValidTxnListState()) {
-  LOG.info("Compiling after acquiring locks");
+  LOG.info("Reexecuting after acquiring locks, since snapshot was 
outdated.");
   // Snapshot was outdated when locks were acquired, hence regenerate 
context,
-  // txn list and retry
-  // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
-  // Currently, we acquire a snapshot, we compile the query wrt that 
snapshot,
-  // and then, we acquire locks. If snapshot is still valid, we 
continue as usual.
-  // But if snapshot is not valid, we recompile the query.
-  if (driverContext.isOutdatedTxn()) {
-driverContext.getTxnManager().rollbackTxn();
-
-String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
-driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
-lockAndRespond();
-  }
-  driverContext.setRetrial(true);
-  driverContext.getBackupContext().addSubContext(context);
-  
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
-  context = driverContext.getBackupContext();
-  driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
-driverContext.getTxnManager().getValidTxns().toString());
-  if (driverContext.getPlan().hasAcidResourcesInQuery()) {
-validTxnManager.recordValidWriteIds();
-  }
-
-  if (!alreadyCompiled) {
-// compile internal will automatically reset the perf logger
-compileInternal(command, true);
-  } else {
-// Since we're reusing the compiled plan, we need to update its 
start time for current run
-
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
-  }
-
-  if (!validTxnManager.isValidTxnListState()) {
-// Throw exception
-throw handleHiveException(new HiveException("Operation could not 
be executed"), 14);
+  // txn list and retry (see ReExecutionRetryLockPlugin)
+  try {
+releaseLocksAndCommitOrRollback(false);
+  } catch (LockException e) {
+handleHiveException(e, 12);
   }
-
-  //Reset the PerfLogger
-  perfLogger = SessionState.getPerfLogger(true);
-
-  // the reason that we set the txn manager for the cxt here is 
because each
-  // query has its own ctx object. The txn mgr is shared across the
-  // same instance of Driver, which can run multiple queries.
-  context.setHiveTxnManager(driverContext.getTxnManager());
+  throw handleHiveException(

Review comment:
   {quote}
   In the original logic, if another commit invalidated the snaphsot a second 
time, the query also failed with HiveExection.
   {quote}
   Iiuc that should not happen because we were holding the locks that we had 
already acquired; however, now we are releasing them. Hence, the logic is 
slightly different?
   
   In any case, it is straightforward to add a config property such as 
`HIVE_QUERY_MAX_REEXECUTION_COUNT` for this specific retry, then retrieve it in 
`shouldReExecute` method in `ReExecutionRetryLockPlugin`: You have both the 
number of retries as well as the conf (`getConf` method) to retrieve the max 
number of retries for the config. The check on 
`HIVE_QUERY_MAX_REEXECUTION_COUNT` for the rest of the plugins will need to be 
moved into `shouldReExecute` method in those plugins too (currently it is done 
within the `run` method in the `ReExecDriver` itself).
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449641)
Time Spent: 1.5h  (was: 1h 20m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 

[jira] [Updated] (HIVE-23668) Clean up Task for Hive Metrics

2020-06-22 Thread Anishek Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anishek Agarwal updated HIVE-23668:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

+1 , Committed to master.

> Clean up Task for Hive Metrics
> --
>
> Key: HIVE-23668
> URL: https://issues.apache.org/jira/browse/HIVE-23668
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23668.01.patch, HIVE-23668.02.patch, 
> HIVE-23668.03.patch, HIVE-23668.04.patch, HIVE-23668.05.patch, 
> HIVE-23668.06.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix

2020-06-22 Thread wanguangping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wanguangping updated HIVE-23748:

Description: 
h1. background
 * SQL on TEZ 
 * it's a Occasional problem

h1. hiveserver2 log

SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
explanation.
 SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
 Connecting to jdbc:hive2://xxx:1/prod
 Connected to: Apache Hive (version 3.1.0.3.1.4.0-315)
 Driver: Hive JDBC (version 3.1.0.3.1.4.0-315)
 Transaction isolation: TRANSACTION_REPEATABLE_READ
 No rows affected (0.04 seconds)
 No rows affected (0.004 seconds)
 No rows affected (0.003 seconds)
 No rows affected (0.004 seconds)
 No rows affected (0.003 seconds)
 No rows affected (0.004 seconds)
 No rows affected (0.003 seconds)
 No rows affected (0.003 seconds)
 No rows affected (0.004 seconds)
 INFO : Compiling 
command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): use 
prod
 INFO : Concurrency mode is disabled, not creating a lock manager
 INFO : Semantic Analysis Completed (retrial = false)
 INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
 INFO : Completed compiling 
command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); Time 
taken: 0.887 seconds
 INFO : Concurrency mode is disabled, not creating a lock manager
 INFO : Executing 
command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): use 
prod
 INFO : Starting task [Stage-0:DDL] in serial mode
 INFO : Completed executing 
command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); Time 
taken: 0.197 seconds
 INFO : OK
 INFO : Concurrency mode is disabled, not creating a lock manager
 No rows affected (1.096 seconds)
 No rows affected (0.004 seconds)
 INFO : Compiling 
command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): drop 
table if exists temp.shawnlee_newbase_devicebase
 INFO : Concurrency mode is disabled, not creating a lock manager
 INFO : Semantic Analysis Completed (retrial = false)
 INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
 INFO : Completed compiling 
command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); Time 
taken: 1.324 seconds
 INFO : Concurrency mode is disabled, not creating a lock manager
 INFO : Executing 
command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): drop 
table if exists temp.shawnlee_newbase_devicebase
 INFO : Starting task [Stage-0:DDL] in serial mode
 INFO : Completed executing 
command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); Time 
taken: 12.895 seconds
 INFO : OK
 INFO : Concurrency mode is disabled, not creating a lock manager
 No rows affected (14.229 seconds)
 INFO : Compiling 
command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): x
 INFO : Concurrency mode is disabled, not creating a lock manager
 INFO : No Stats for user_profile@dw_uba_event_daily, Columns: attribute, event
 INFO : Semantic Analysis Completed (retrial = false)
 INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:day, 
type:string, comment:null), FieldSchema(name:device_id, type:string, 
comment:null), FieldSchema(name:is_new, type:int, comment:null), 
FieldSchema(name:first_attribute, type:map, comment:null), 
FieldSchema(name:first_app_version, type:string, comment:null), 
FieldSchema(name:first_platform_type, type:string, comment:null), 
FieldSchema(name:first_manufacturer, type:string, comment:null), 
FieldSchema(name:first_model, type:string, comment:null), 
FieldSchema(name:first_ipprovince, type:string, comment:null), 
FieldSchema(name:first_ipcity, type:string, comment:null), 
FieldSchema(name:last_attribute, type:map, comment:null), 
FieldSchema(name:last_app_version, type:string, comment:null), 
FieldSchema(name:last_platform_type, type:string, comment:null), 
FieldSchema(name:last_manufacturer, type:string, comment:null), 
FieldSchema(name:last_model, type:string, comment:null), 
FieldSchema(name:last_ipprovince, type:string, comment:null), 
FieldSchema(name:last_ipcity, type:string, comment:null)], properties:null)
 INFO : Completed compiling 
command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f); Time 
taken: 78.517 seconds
 INFO : Concurrency mode is disabled, not creating a lock manager
 INFO : Executing 
command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): 
 INFO : Query ID = hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f
 INFO : Total jobs = 3
 INFO 

[jira] [Updated] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix

2020-06-22 Thread wanguangping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wanguangping updated HIVE-23748:

Description: 
h1. background
 * SQL on TEZ 
 * it's a Occasional problem

h1. hiveserver2 log

SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
explanation.
 SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
 Connecting to jdbc:hive2://xxx:1/prod
 Connected to: Apache Hive (version 3.1.0.3.1.4.0-315)
 Driver: Hive JDBC (version 3.1.0.3.1.4.0-315)
 Transaction isolation: TRANSACTION_REPEATABLE_READ
 INFO : Compiling 
command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): use 
prod
 INFO : Concurrency mode is disabled, not creating a lock manager
 INFO : Semantic Analysis Completed (retrial = false)
 INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
 INFO : Completed compiling 
command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); Time 
taken: 0.887 seconds
 INFO : Concurrency mode is disabled, not creating a lock manager
 INFO : Executing 
command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): use 
prod
 INFO : Starting task [Stage-0:DDL] in serial mode
 INFO : Completed executing 
command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); Time 
taken: 0.197 seconds
 INFO : OK
 INFO : Concurrency mode is disabled, not creating a lock manager
 No rows affected (1.096 seconds)
 No rows affected (0.004 seconds)
 INFO : Compiling 
command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): drop 
table if exists temp.shawnlee_newbase_devicebase
 INFO : Concurrency mode is disabled, not creating a lock manager
 INFO : Semantic Analysis Completed (retrial = false)
 INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
 INFO : Completed compiling 
command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); Time 
taken: 1.324 seconds
 INFO : Concurrency mode is disabled, not creating a lock manager
 INFO : Executing 
command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): drop 
table if exists temp.shawnlee_newbase_devicebase
 INFO : Starting task [Stage-0:DDL] in serial mode
 INFO : Completed executing 
command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); Time 
taken: 12.895 seconds
 INFO : OK
 INFO : Concurrency mode is disabled, not creating a lock manager
 No rows affected (14.229 seconds)
 INFO : Compiling 
command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): x
 INFO : Concurrency mode is disabled, not creating a lock manager
 INFO : No Stats for user_profile@dw_uba_event_daily, Columns: attribute, event
 INFO : Semantic Analysis Completed (retrial = false)
 INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:day, 
type:string, comment:null), FieldSchema(name:device_id, type:string, 
comment:null), FieldSchema(name:is_new, type:int, comment:null), 
FieldSchema(name:first_attribute, type:map, comment:null), 
FieldSchema(name:first_app_version, type:string, comment:null), 
FieldSchema(name:first_platform_type, type:string, comment:null), 
FieldSchema(name:first_manufacturer, type:string, comment:null), 
FieldSchema(name:first_model, type:string, comment:null), 
FieldSchema(name:first_ipprovince, type:string, comment:null), 
FieldSchema(name:first_ipcity, type:string, comment:null), 
FieldSchema(name:last_attribute, type:map, comment:null), 
FieldSchema(name:last_app_version, type:string, comment:null), 
FieldSchema(name:last_platform_type, type:string, comment:null), 
FieldSchema(name:last_manufacturer, type:string, comment:null), 
FieldSchema(name:last_model, type:string, comment:null), 
FieldSchema(name:last_ipprovince, type:string, comment:null), 
FieldSchema(name:last_ipcity, type:string, comment:null)], properties:null)
 INFO : Completed compiling 
command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f); Time 
taken: 78.517 seconds
 INFO : Concurrency mode is disabled, not creating a lock manager
 INFO : Executing 
command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): 
 INFO : Query ID = hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f
 INFO : Total jobs = 3
 INFO : Launching Job 1 out of 3
 INFO : Starting task [Stage-1:MAPRED] in serial mode
 INFO : Subscribed to counters: [] for queryId: 
hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f
 INFO : Tez session hasn't been created yet. Opening session
 INFO : Dag name: 
 INFO : Status: Running 

[jira] [Updated] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix

2020-06-22 Thread wanguangping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wanguangping updated HIVE-23748:

Description: 
h1. background
 * SQL on TEZ 
 * it's a Occasional problem

h1. hiveserver2 log

SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
explanation.
 SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
 Connecting to jdbc:hive2://xxx:1/prod
 Connected to: Apache Hive (version 3.1.0.3.1.4.0-315)
 Driver: Hive JDBC (version 3.1.0.3.1.4.0-315)
 Transaction isolation: TRANSACTION_REPEATABLE_READ
 No rows affected (0.04 seconds)
 No rows affected (0.004 seconds)
 No rows affected (0.003 seconds)
 No rows affected (0.004 seconds)
 No rows affected (0.003 seconds)
 No rows affected (0.004 seconds)
 No rows affected (0.003 seconds)
 No rows affected (0.003 seconds)
 No rows affected (0.004 seconds)
 INFO : Compiling 
command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): use 
prod
 INFO : Concurrency mode is disabled, not creating a lock manager
 INFO : Semantic Analysis Completed (retrial = false)
 INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
 INFO : Completed compiling 
command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); Time 
taken: 0.887 seconds
 INFO : Concurrency mode is disabled, not creating a lock manager
 INFO : Executing 
command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): use 
prod
 INFO : Starting task [Stage-0:DDL] in serial mode
 INFO : Completed executing 
command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); Time 
taken: 0.197 seconds
 INFO : OK
 INFO : Concurrency mode is disabled, not creating a lock manager
 No rows affected (1.096 seconds)
 No rows affected (0.004 seconds)
 INFO : Compiling 
command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): drop 
table if exists temp.shawnlee_newbase_devicebase
 INFO : Concurrency mode is disabled, not creating a lock manager
 INFO : Semantic Analysis Completed (retrial = false)
 INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
 INFO : Completed compiling 
command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); Time 
taken: 1.324 seconds
 INFO : Concurrency mode is disabled, not creating a lock manager
 INFO : Executing 
command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): drop 
table if exists temp.shawnlee_newbase_devicebase
 INFO : Starting task [Stage-0:DDL] in serial mode
 INFO : Completed executing 
command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); Time 
taken: 12.895 seconds
 INFO : OK
 INFO : Concurrency mode is disabled, not creating a lock manager
 No rows affected (14.229 seconds)
 INFO : Compiling 
command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): x
 INFO : Concurrency mode is disabled, not creating a lock manager
 INFO : No Stats for user_profile@dw_uba_event_daily, Columns: attribute, event
 INFO : Semantic Analysis Completed (retrial = false)
 INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:day, 
type:string, comment:null), FieldSchema(name:device_id, type:string, 
comment:null), FieldSchema(name:is_new, type:int, comment:null), 
FieldSchema(name:first_attribute, type:map, comment:null), 
FieldSchema(name:first_app_version, type:string, comment:null), 
FieldSchema(name:first_platform_type, type:string, comment:null), 
FieldSchema(name:first_manufacturer, type:string, comment:null), 
FieldSchema(name:first_model, type:string, comment:null), 
FieldSchema(name:first_ipprovince, type:string, comment:null), 
FieldSchema(name:first_ipcity, type:string, comment:null), 
FieldSchema(name:last_attribute, type:map, comment:null), 
FieldSchema(name:last_app_version, type:string, comment:null), 
FieldSchema(name:last_platform_type, type:string, comment:null), 
FieldSchema(name:last_manufacturer, type:string, comment:null), 
FieldSchema(name:last_model, type:string, comment:null), 
FieldSchema(name:last_ipprovince, type:string, comment:null), 
FieldSchema(name:last_ipcity, type:string, comment:null)], properties:null)
 INFO : Completed compiling 
command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f); Time 
taken: 78.517 seconds
 INFO : Concurrency mode is disabled, not creating a lock manager
 INFO : Executing 
command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): 
 INFO : Query ID = hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f
 INFO : Total jobs = 3
 INFO 

[jira] [Updated] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix

2020-06-22 Thread wanguangping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wanguangping updated HIVE-23748:

Description: 
h1. background
 * SQL on TEZ 
 * it's a Occasional problem

h1. hiveserver2 log

 

flume failed to collect logs ,collect local log files now. 
***
 ***
 TIME 1 RUN*
 ***
 ***
2020-06-09 03:33:11 INFO log dir:/home/data/wwwuser/falcon-runner/logs
2020-06-09 03:33:11 INFO sql usage1: comments
2020-06-09 03:33:11 INFO set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.auto.convert.join=true;
set mapred.job.queue.name=product;
set tez.queue.name=product;
set tez.queue.name=product;
set mapred.job.name=hivetask_smile.wang_994328;
set hive.task.name=hivetask_smile.wang_994328;
set mapred.job.priority=NORMAL;
use prod;
--SQL is written below--
set tez.queue.name=superman;
drop table if exists temp.shawnlee_newbase_devicebase;
create table if not exists temp.shawnlee_newbase_devicebase as 
with is_first_day as (
select 
 day,
 attribute['\$device_id'] device_id
from user_profile.dw_uba_event_daily
where day >= date_sub(current_date(),20) and day <=date_sub(current_date(),1)
 and event !='\$AppStartPassively'
 and attribute['\$is_first_day'] = 1
 and attribute['platform_type'] in ('Android','iOS') 
 and project = 'default'
 and attribute['\$device_id'] is not null
group by day,attribute['\$device_id']
),
last_msg as (
select
 a.day,
 a.device_id,
 attribute as last_attribute
from 
(select 
 day,
 attribute['time'] ts,
 attribute['\$device_id'] device_id,
 attribute,
 row_number() over(partition by day,attribute['\$device_id'] order by 
attribute['time'] desc) row_number
from user_profile.dw_uba_event_daily
where day >= date_sub(current_date(),20) and day <=date_sub(current_date(),1)
 and event !='\$AppStartPassively'
 and attribute['platform_type'] in ('Android','iOS') 
 and attribute['\$device_id'] is not null
 and project = 'default') a
where a.row_number=1 
),
first_msg as (
select
 a.day,
 a.device_id,
 attribute as first_attribute
from 
(select 
 day,
 attribute['time'] ts,
 attribute['\$device_id'] device_id,
 attribute,
 row_number() over(partition by day,attribute['\$device_id'] order by 
attribute['time'] ) row_number
from user_profile.dw_uba_event_daily
where day >= date_sub(current_date(),20) and day <=date_sub(current_date(),1)
 and event !='\$AppStartPassively'
 and attribute['platform_type'] in ('Android','iOS') 
 and attribute['\$device_id'] is not null
 and project = 'default') a
where a.row_number=1
),
rihuo as (
select 
 day,
 attribute['\$device_id'] device_id
from user_profile.dw_uba_event_daily
where day >= date_sub(current_date(),20) and day <=date_sub(current_date(),1)
 and event !='\$AppStartPassively'
 and attribute['platform_type'] in ('Android','iOS') 
 and project = 'default'
 and attribute['\$device_id'] is not null
group by day,attribute['\$device_id']
)
select
 a.day,
 a.device_id,
 (case when b.device_id is not null then 1 else 0 end) is_new,
 c.first_attribute,
 c.first_attribute['\$app_version'] as 
first_app_version,c.first_attribute['platform_type'] as first_platform_type,
 c.first_attribute['\$manufacturer'] as 
first_manufacturer,c.first_attribute['\$model'] as first_model,
 c.first_attribute['\$province'] as 
first_ipprovince,c.first_attribute['\$city'] as first_ipcity,
 d.last_attribute,
 d.last_attribute['\$app_version'] as 
last_app_version,d.last_attribute['platform_type'] as last_platform_type,
 d.last_attribute['\$manufacturer'] as 
last_manufacturer,d.last_attribute['\$model'] as last_model,
 d.last_attribute['\$province'] as last_ipprovince,d.last_attribute['\$city'] 
as last_ipcity
from rihuo a 
left join is_first_day b on a.day=b.day and a.device_id=b.device_id
left join first_msg c on a.day=c.day and a.device_id=c.device_id
left join last_msg d on a.day=d.day and a.device_id=d.device_id
;
insert overwrite table user_devicebase_shawnlee_app PARTITION (day)
select 
 device_id,
 is_new,
 first_attribute,
 first_app_version,
 first_platform_type,
 first_manufacturer,
 first_model,
 first_ipprovince,
 first_ipcity,
 last_attribute,
 last_app_version,
 last_platform_type,
 last_manufacturer,
 last_model,
 last_ipprovince,
 last_ipcity,
 day 
from temp.shawnlee_newbase_devicebase
;
--SQL is written above--
"
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See 

[jira] [Updated] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix

2020-06-22 Thread wanguangping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wanguangping updated HIVE-23748:

Description: 
h1. background
 * SQL on TEZ 
 * it's a Occasional problem

h1. hiveserver2 log

 

 

 

  was:
h1. [^hiveserver2 log.txt]background
 * SQL on TEZ 
 * it's a Occasional problem

h1. hiveserver2 log

[^hiveserver2 log.txt]

 

 


> tez task with File Merge operator generate tmp file with wrong suffix
> -
>
> Key: HIVE-23748
> URL: https://issues.apache.org/jira/browse/HIVE-23748
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0
>Reporter: wanguangping
>Priority: Major
>
> h1. background
>  * SQL on TEZ 
>  * it's a Occasional problem
> h1. hiveserver2 log
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix

2020-06-22 Thread wanguangping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wanguangping updated HIVE-23748:

Description: 
h1. [^hiveserver2 log.txt]background
 * SQL on TEZ 
 * it's a Occasional problem

h1. hiveserver2 log

[^hiveserver2 log.txt]

 

 

  was:
h1. background
 * SQL on TEZ 
 * it's a Occasional problem

h1. hiveserver2 log

[^hiveserver2 log.txt]

 

 


> tez task with File Merge operator generate tmp file with wrong suffix
> -
>
> Key: HIVE-23748
> URL: https://issues.apache.org/jira/browse/HIVE-23748
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0
>Reporter: wanguangping
>Priority: Major
>
> h1. [^hiveserver2 log.txt]background
>  * SQL on TEZ 
>  * it's a Occasional problem
> h1. hiveserver2 log
> [^hiveserver2 log.txt]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23735) Reducer misestimate for export command

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23735:
--
Labels: pull-request-available  (was: )

> Reducer misestimate for export command
> --
>
> Key: HIVE-23735
> URL: https://issues.apache.org/jira/browse/HIVE-23735
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23735.1.wip.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L6869
> {code}
> if (dest_tab.getNumBuckets() > 0) {
> ...
> }
> {code}
> For "export" command, HS2 creates a dummy table and for this table and gets 
> "1" as the number of buckets.
> {noformat}
> set hive.stats.autogather=false;
> export table sample_table to '/tmp/export/sampe_db/t1';
> {noformat}
> This causes issues in reducer estimates and always lands up with '1' as the 
> number of reducer task. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23735) Reducer misestimate for export command

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23735?focusedWorklogId=449577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449577
 ]

ASF GitHub Bot logged work on HIVE-23735:
-

Author: ASF GitHub Bot
Created on: 23/Jun/20 01:07
Start Date: 23/Jun/20 01:07
Worklog Time Spent: 10m 
  Work Description: rbalamohan opened a new pull request #1165:
URL: https://github.com/apache/hive/pull/1165


   HIVE-23735: Reducer misestimate for export command
   
   SemanticAnalyzer::genBucketingSortingDest checks for number of buckets for 
enforceBucketing and based on this number of reducers are determined. Patch 
adds one more check for bucketCols to ensure that only valid tables gets the 
reducer sink.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449577)
Remaining Estimate: 0h
Time Spent: 10m

> Reducer misestimate for export command
> --
>
> Key: HIVE-23735
> URL: https://issues.apache.org/jira/browse/HIVE-23735
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23735.1.wip.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L6869
> {code}
> if (dest_tab.getNumBuckets() > 0) {
> ...
> }
> {code}
> For "export" command, HS2 creates a dummy table and for this table and gets 
> "1" as the number of buckets.
> {noformat}
> set hive.stats.autogather=false;
> export table sample_table to '/tmp/export/sampe_db/t1';
> {noformat}
> This causes issues in reducer estimates and always lands up with '1' as the 
> number of reducer task. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23735) Reducer misestimate for export command

2020-06-22 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23735:

Status: Open  (was: Patch Available)

> Reducer misestimate for export command
> --
>
> Key: HIVE-23735
> URL: https://issues.apache.org/jira/browse/HIVE-23735
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23735.1.wip.patch
>
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L6869
> {code}
> if (dest_tab.getNumBuckets() > 0) {
> ...
> }
> {code}
> For "export" command, HS2 creates a dummy table and for this table and gets 
> "1" as the number of buckets.
> {noformat}
> set hive.stats.autogather=false;
> export table sample_table to '/tmp/export/sampe_db/t1';
> {noformat}
> This causes issues in reducer estimates and always lands up with '1' as the 
> number of reducer task. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23735) Reducer misestimate for export command

2020-06-22 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23735:

Status: Patch Available  (was: Open)

> Reducer misestimate for export command
> --
>
> Key: HIVE-23735
> URL: https://issues.apache.org/jira/browse/HIVE-23735
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23735.1.wip.patch
>
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L6869
> {code}
> if (dest_tab.getNumBuckets() > 0) {
> ...
> }
> {code}
> For "export" command, HS2 creates a dummy table and for this table and gets 
> "1" as the number of buckets.
> {noformat}
> set hive.stats.autogather=false;
> export table sample_table to '/tmp/export/sampe_db/t1';
> {noformat}
> This causes issues in reducer estimates and always lands up with '1' as the 
> number of reducer task. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-23746) Send task attempts async from AM to daemons

2020-06-22 Thread Mustafa Iman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23746 started by Mustafa Iman.
---
> Send task attempts async from AM to daemons
> ---
>
> Key: HIVE-23746
> URL: https://issues.apache.org/jira/browse/HIVE-23746
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>
> LlapTaskCommunicator uses sync client to send task attempts. There are fixed 
> number of communication threads (10 by default). This causes unneccessary 
> delays when there are enough free execution slots in daemons but they do not 
> receive all the tasks because of this bottleneck. LlapTaskCommunicator can 
> use an async client to pass these tasks to daemons. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23747) Increase the number of parallel tasks sent to daemons from am

2020-06-22 Thread Mustafa Iman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman reassigned HIVE-23747:
---


> Increase the number of parallel tasks sent to daemons from am
> -
>
> Key: HIVE-23747
> URL: https://issues.apache.org/jira/browse/HIVE-23747
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>
> The number of inflight tasks from AM to a single executor is hardcoded to 1 
> currently([https://github.com/apache/hive/blob/master/llap-client/src/java/org/apache/hadoop/hive/llap/tez/LlapProtocolClientProxy.java#L57]
>  ). It does not make sense to increase this right now as communication 
> between am and daemons happen synchronously anyway. After resolving 
> https://issues.apache.org/jira/browse/HIVE-23746 this must be increased to at 
> least number of execution slots per daemon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23746) Send task attempts async from AM to daemons

2020-06-22 Thread Mustafa Iman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman reassigned HIVE-23746:
---


> Send task attempts async from AM to daemons
> ---
>
> Key: HIVE-23746
> URL: https://issues.apache.org/jira/browse/HIVE-23746
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>
> LlapTaskCommunicator uses sync client to send task attempts. There are fixed 
> number of communication threads (10 by default). This causes unneccessary 
> delays when there are enough free execution slots in daemons but they do not 
> receive all the tasks because of this bottleneck. LlapTaskCommunicator can 
> use an async client to pass these tasks to daemons. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23745) Avoid copying userpayload in task communicator

2020-06-22 Thread Mustafa Iman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman reassigned HIVE-23745:
---


> Avoid copying userpayload in task communicator
> --
>
> Key: HIVE-23745
> URL: https://issues.apache.org/jira/browse/HIVE-23745
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>
> [https://github.com/apache/hive/blob/master/llap-common/src/java/org/apache/hadoop/hive/llap/tez/Converters.java#L182]
>  I see this copy take a few milliseconds sometimes. Delay here adds up for 
> all tasks of a single vertex in LlapTaskCommunicator as it processes tasks 
> one by one. User payload never changes in this codepath. Copy is made because 
> of limitations of Protobuf library. Protobuf adds a UnsafeByteOperations 
> class that avoid copying of ByteBuffers in 3.1 version. This can be resolved 
> when Protobuf is upgraded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23596) LLAP: Encode initial guaranteed task information in containerId

2020-06-22 Thread Mustafa Iman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman updated HIVE-23596:

Parent: HIVE-23744
Issue Type: Sub-task  (was: Improvement)

> LLAP: Encode initial guaranteed task information in containerId
> ---
>
> Key: HIVE-23596
> URL: https://issues.apache.org/jira/browse/HIVE-23596
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We should avoid calling LlapTaskScheduler to get initial isguaranteed flag 
> for all the tasks. It causes arbitrary delays in sending tasks out. Since 
> communicator is a single thread, any blocking there delays all the tasks.
> There are [https://jira.apache.org/jira/browse/TEZ-4192] and 
> [https://jira.apache.org/jira/browse/HIVE-23589] for a proper solution to 
> this. However, that requires a Tez release which seems far right now. We can 
> replace the current hack with another hack that does not require locking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23744) Reduce query startup latency

2020-06-22 Thread Mustafa Iman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman reassigned HIVE-23744:
---


> Reduce query startup latency
> 
>
> Key: HIVE-23744
> URL: https://issues.apache.org/jira/browse/HIVE-23744
> Project: Hive
>  Issue Type: Task
>  Components: llap
>Affects Versions: 4.0.0
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
> Attachments: am_schedule_and_transmit.png, task_start.png
>
>
> When I run queries with large number of tasks for a single vertex, I see a 
> significant delay before all tasks start execution in llap daemons. 
> Although llap daemons have the free capacity to run the tasks, it takes a 
> significant time to schedule all the tasks in AM and actually transmit them 
> to executors.
> "am_schedule_and_transmit" shows scheduling of tasks of tpcds query 55. It 
> shows only the tasks scheduled for one of 10 llap daemons. The scheduler 
> works in a single thread, scheduling tasks one by one. A delay in scheduling 
> of one task, delays all the tasks.
> !am_schedule_and_transmit.png|width=831,height=573!
>  
> Another issue is that it takes long time to fill all the execution slots in 
> llap daemons even though they are all empty initially. This is caused by 
> LlapTaskCommunicator using a fixed number of threads (10 by default) to send 
> the tasks to daemons. Also this communication is synchronized so these 
> threads block communication staying idle. "task_start.png" shows running 
> tasks on an llap daemon that has 12 execution slots. By the time 12th task 
> starts running, more than 100ms already passes. That slot stays idle all this 
> time. 
> !task_start.png|width=1166,height=635!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23743) hive-druid-handler shaded jar doesn't include maven-artifact classes

2020-06-22 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hankó Gergely reassigned HIVE-23743:



> hive-druid-handler shaded jar doesn't include maven-artifact classes
> 
>
> Key: HIVE-23743
> URL: https://issues.apache.org/jira/browse/HIVE-23743
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Hankó Gergely
>Assignee: Nishant Bangarwa
>Priority: Major
>
> hive-druid-handler depends on druid-processing jar that depends on classes 
> from maven-artifact jar but these classes are not included in the shaded jar 
> so the following Exception may occur:
> {code:java}
> java.lang.ClassNotFoundException: 
> org.apache.maven.artifact.versioning.ArtifactVersion at
> ...
> org.apache.hive.druid.org.apache.druid.query.ordering.StringComparators.(StringComparators.java:44)
>  at 
> org.apache.hive.druid.org.apache.druid.query.ordering.StringComparator.fromString(StringComparator.java:35)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23640) Fix FindBug issues in hive-druid-handler

2020-06-22 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-23640:
-

Assignee: Panagiotis Garefalakis

> Fix FindBug issues in hive-druid-handler
> 
>
> Key: HIVE-23640
> URL: https://issues.apache.org/jira/browse/HIVE-23640
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: spotbugsXml.xml
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23640) Fix FindBug issues in hive-druid-handler

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23640:
--
Labels: pull-request-available  (was: )

> Fix FindBug issues in hive-druid-handler
> 
>
> Key: HIVE-23640
> URL: https://issues.apache.org/jira/browse/HIVE-23640
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-23640) Fix FindBug issues in hive-druid-handler

2020-06-22 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23640 started by Panagiotis Garefalakis.
-
> Fix FindBug issues in hive-druid-handler
> 
>
> Key: HIVE-23640
> URL: https://issues.apache.org/jira/browse/HIVE-23640
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: spotbugsXml.xml
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23640) Fix FindBug issues in hive-druid-handler

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23640?focusedWorklogId=449403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449403
 ]

ASF GitHub Bot logged work on HIVE-23640:
-

Author: ASF GitHub Bot
Created on: 22/Jun/20 17:25
Start Date: 22/Jun/20 17:25
Worklog Time Spent: 10m 
  Work Description: pgaref opened a new pull request #1164:
URL: https://github.com/apache/hive/pull/1164


   Change-Id: I8a05baac6fd3b98eb513fd1cfa702409e052bc27
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449403)
Remaining Estimate: 0h
Time Spent: 10m

> Fix FindBug issues in hive-druid-handler
> 
>
> Key: HIVE-23640
> URL: https://issues.apache.org/jira/browse/HIVE-23640
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: spotbugsXml.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23668) Clean up Task for Hive Metrics

2020-06-22 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23668:
---
Attachment: HIVE-23668.06.patch
Status: Patch Available  (was: In Progress)

> Clean up Task for Hive Metrics
> --
>
> Key: HIVE-23668
> URL: https://issues.apache.org/jira/browse/HIVE-23668
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23668.01.patch, HIVE-23668.02.patch, 
> HIVE-23668.03.patch, HIVE-23668.04.patch, HIVE-23668.05.patch, 
> HIVE-23668.06.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23668) Clean up Task for Hive Metrics

2020-06-22 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23668:
---
Status: In Progress  (was: Patch Available)

> Clean up Task for Hive Metrics
> --
>
> Key: HIVE-23668
> URL: https://issues.apache.org/jira/browse/HIVE-23668
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23668.01.patch, HIVE-23668.02.patch, 
> HIVE-23668.03.patch, HIVE-23668.04.patch, HIVE-23668.05.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23639) Fix FindBug issues in hive-contrib

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23639?focusedWorklogId=449396=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449396
 ]

ASF GitHub Bot logged work on HIVE-23639:
-

Author: ASF GitHub Bot
Created on: 22/Jun/20 16:58
Start Date: 22/Jun/20 16:58
Worklog Time Spent: 10m 
  Work Description: pgaref opened a new pull request #1163:
URL: https://github.com/apache/hive/pull/1163


   Change-Id: I39afabc24bd9f2a8fca6c2a872069005356688f2
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449396)
Remaining Estimate: 0h
Time Spent: 10m

> Fix FindBug issues in hive-contrib
> --
>
> Key: HIVE-23639
> URL: https://issues.apache.org/jira/browse/HIVE-23639
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Priority: Major
> Attachments: spotbugsXml.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23639) Fix FindBug issues in hive-contrib

2020-06-22 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-23639:
-

Assignee: Panagiotis Garefalakis

> Fix FindBug issues in hive-contrib
> --
>
> Key: HIVE-23639
> URL: https://issues.apache.org/jira/browse/HIVE-23639
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23639) Fix FindBug issues in hive-contrib

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23639:
--
Labels: pull-request-available  (was: )

> Fix FindBug issues in hive-contrib
> --
>
> Key: HIVE-23639
> URL: https://issues.apache.org/jira/browse/HIVE-23639
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-23639) Fix FindBug issues in hive-contrib

2020-06-22 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23639 started by Panagiotis Garefalakis.
-
> Fix FindBug issues in hive-contrib
> --
>
> Key: HIVE-23639
> URL: https://issues.apache.org/jira/browse/HIVE-23639
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23637) Fix FindBug issues in hive-cli

2020-06-22 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-23637:
-

Assignee: Panagiotis Garefalakis

> Fix FindBug issues in hive-cli
> --
>
> Key: HIVE-23637
> URL: https://issues.apache.org/jira/browse/HIVE-23637
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: spotbugsXml.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-23637) Fix FindBug issues in hive-cli

2020-06-22 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23637 started by Panagiotis Garefalakis.
-
> Fix FindBug issues in hive-cli
> --
>
> Key: HIVE-23637
> URL: https://issues.apache.org/jira/browse/HIVE-23637
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23637) Fix FindBug issues in hive-cli

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23637?focusedWorklogId=449377=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449377
 ]

ASF GitHub Bot logged work on HIVE-23637:
-

Author: ASF GitHub Bot
Created on: 22/Jun/20 16:30
Start Date: 22/Jun/20 16:30
Worklog Time Spent: 10m 
  Work Description: pgaref opened a new pull request #1162:
URL: https://github.com/apache/hive/pull/1162


   Change-Id: I93fa0c8713950e493a3a212511bb86566bb53c46
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449377)
Remaining Estimate: 0h
Time Spent: 10m

> Fix FindBug issues in hive-cli
> --
>
> Key: HIVE-23637
> URL: https://issues.apache.org/jira/browse/HIVE-23637
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Priority: Major
> Attachments: spotbugsXml.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23637) Fix FindBug issues in hive-cli

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23637:
--
Labels: pull-request-available  (was: )

> Fix FindBug issues in hive-cli
> --
>
> Key: HIVE-23637
> URL: https://issues.apache.org/jira/browse/HIVE-23637
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-06-22 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142241#comment-17142241
 ] 

Prasanth Jayachandran commented on HIVE-23737:
--

cc/ [~rajesh.balamohan]

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23638) Fix FindBug issues in hive-common

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23638?focusedWorklogId=449336=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449336
 ]

ASF GitHub Bot logged work on HIVE-23638:
-

Author: ASF GitHub Bot
Created on: 22/Jun/20 15:22
Start Date: 22/Jun/20 15:22
Worklog Time Spent: 10m 
  Work Description: pgaref opened a new pull request #1161:
URL: https://github.com/apache/hive/pull/1161


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449336)
Remaining Estimate: 0h
Time Spent: 10m

> Fix FindBug issues in hive-common
> -
>
> Key: HIVE-23638
> URL: https://issues.apache.org/jira/browse/HIVE-23638
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: spotbugsXml.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> mvn -Pspotbugs 
> -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO
>  -pl :hive-common test-compile 
> com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=449334=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449334
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 22/Jun/20 15:22
Start Date: 22/Jun/20 15:22
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r443639126



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -675,50 +678,18 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
   try {
 if (!validTxnManager.isValidTxnListState()) {
-  LOG.info("Compiling after acquiring locks");
+  LOG.info("Reexecuting after acquiring locks, since snapshot was 
outdated.");
   // Snapshot was outdated when locks were acquired, hence regenerate 
context,
-  // txn list and retry
-  // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
-  // Currently, we acquire a snapshot, we compile the query wrt that 
snapshot,
-  // and then, we acquire locks. If snapshot is still valid, we 
continue as usual.
-  // But if snapshot is not valid, we recompile the query.
-  if (driverContext.isOutdatedTxn()) {
-driverContext.getTxnManager().rollbackTxn();
-
-String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
-driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
-lockAndRespond();
-  }
-  driverContext.setRetrial(true);
-  driverContext.getBackupContext().addSubContext(context);
-  
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
-  context = driverContext.getBackupContext();
-  driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
-driverContext.getTxnManager().getValidTxns().toString());
-  if (driverContext.getPlan().hasAcidResourcesInQuery()) {
-validTxnManager.recordValidWriteIds();
-  }
-
-  if (!alreadyCompiled) {
-// compile internal will automatically reset the perf logger
-compileInternal(command, true);
-  } else {
-// Since we're reusing the compiled plan, we need to update its 
start time for current run
-
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
-  }
-
-  if (!validTxnManager.isValidTxnListState()) {
-// Throw exception
-throw handleHiveException(new HiveException("Operation could not 
be executed"), 14);
+  // txn list and retry (see ReExecutionRetryLockPlugin)
+  try {
+releaseLocksAndCommitOrRollback(false);
+  } catch (LockException e) {
+handleHiveException(e, 12);
   }
-
-  //Reset the PerfLogger
-  perfLogger = SessionState.getPerfLogger(true);
-
-  // the reason that we set the txn manager for the cxt here is 
because each
-  // query has its own ctx object. The txn mgr is shared across the
-  // same instance of Driver, which can run multiple queries.
-  context.setHiveTxnManager(driverContext.getTxnManager());
+  throw handleHiveException(

Review comment:
   This is an interesting question. In the original logic, if an other 
commit invalidated the snaphsot a second time, the query also failed with 
HiveExection. The main difference is, we do more work in this case (compile and 
acquire the locks again), so the chance is probably higher that the snapshot 
gets invalidated a second time, but I don't know if it is high enough that we 
should consider it. The ReexecDriver uses one global config for the number of 
retries, it would take some refactoring to make it independently configurable 
for the different plugins.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449334)
Time Spent: 1h 20m  (was: 1h 10m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: 

[jira] [Updated] (HIVE-23638) Fix FindBug issues in hive-common

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23638:
--
Labels: pull-request-available  (was: )

> Fix FindBug issues in hive-common
> -
>
> Key: HIVE-23638
> URL: https://issues.apache.org/jira/browse/HIVE-23638
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> mvn -Pspotbugs 
> -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO
>  -pl :hive-common test-compile 
> com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23742) Remove unintentional execution of TPC-DS query39 in qtests

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23742:
--
Labels: pull-request-available  (was: )

> Remove unintentional execution of TPC-DS query39 in qtests
> --
>
> Key: HIVE-23742
> URL: https://issues.apache.org/jira/browse/HIVE-23742
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TPC-DS queries under clientpositive/perf are meant only to check plan 
> regressions so they should never be really executed thus the execution part 
> should be removed from query39.q and cbo_query39.q



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23742) Remove unintentional execution of TPC-DS query39 in qtests

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23742?focusedWorklogId=449326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449326
 ]

ASF GitHub Bot logged work on HIVE-23742:
-

Author: ASF GitHub Bot
Created on: 22/Jun/20 15:07
Start Date: 22/Jun/20 15:07
Worklog Time Spent: 10m 
  Work Description: zabetak opened a new pull request #1160:
URL: https://github.com/apache/hive/pull/1160


   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449326)
Remaining Estimate: 0h
Time Spent: 10m

> Remove unintentional execution of TPC-DS query39 in qtests
> --
>
> Key: HIVE-23742
> URL: https://issues.apache.org/jira/browse/HIVE-23742
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TPC-DS queries under clientpositive/perf are meant only to check plan 
> regressions so they should never be really executed thus the execution part 
> should be removed from query39.q and cbo_query39.q



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23741) Store CacheTags in the file cache level

2020-06-22 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits updated HIVE-23741:
---
Status: Patch Available  (was: Open)

> Store CacheTags in the file cache level
> ---
>
> Key: HIVE-23741
> URL: https://issues.apache.org/jira/browse/HIVE-23741
> Project: Hive
>  Issue Type: Improvement
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CacheTags are currently stored for every data buffer. The strings are 
> internalized, but the number of cache tag objects can be reduced by moving 
> them to the file cache level, and back referencing them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23730) Compiler support tracking TS keyColName for Probe MapJoin

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23730?focusedWorklogId=449314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449314
 ]

ASF GitHub Bot logged work on HIVE-23730:
-

Author: ASF GitHub Bot
Created on: 22/Jun/20 14:51
Start Date: 22/Jun/20 14:51
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1152:
URL: https://github.com/apache/hive/pull/1152#discussion_r443617251



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
##
@@ -1566,13 +1569,38 @@ private void 
removeSemijoinsParallelToMapJoin(OptimizeTezProcContext procCtx)
 
   List keyDesc = 
selectedMJOp.getConf().getKeys().get(posBigTable);
   ExprNodeColumnDesc keyCol = (ExprNodeColumnDesc) keyDesc.get(0);
-
-  tsProbeDecodeCtx = new TableScanOperator.ProbeDecodeContext(mjCacheKey, 
mjSmallTablePos,
-  keyCol.getColumn(), selectedMJOpRatio);
+  String realTSColName = getOriginalTSColName(selectedMJOp, 
keyCol.getColumn());
+  if (realTSColName != null) {
+tsProbeDecodeCtx = new 
TableScanOperator.ProbeDecodeContext(mjCacheKey, mjSmallTablePos,
+realTSColName, selectedMJOpRatio);
+  } else {
+LOG.warn("ProbeDecode could not find TSColName for ColKey {} with MJ 
Schema {} ", keyCol, selectedMJOp.getSchema());

Review comment:
   Qtest results here:
   
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1152/4/tests/
   
   Seems that for for existing MJ ops the probedecode optimisation works fine 
(properly finds original TS col alias as well). Not sure if we want to enable 
probe by default however. Thoughts? cc @ashutoshc 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449314)
Time Spent: 50m  (was: 40m)

> Compiler support tracking TS keyColName for Probe MapJoin
> -
>
> Key: HIVE-23730
> URL: https://issues.apache.org/jira/browse/HIVE-23730
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Compiler needs to track the original TS key columnName used for MJ 
> probedecode.
> Even though we know the MJ keyCol at compile time, this could be generated by 
> previous (parent) operators thus we dont always know the original TS column 
> it maps to.
> To find the original columnMapping, we need to track the MJ keyCol through 
> the operator pipeline. Tracking can be done through the parent operator 
> ColumnExprMap and RowSchema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23742) Remove unintentional execution of TPC-DS query39 in qtests

2020-06-22 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-23742:
--


> Remove unintentional execution of TPC-DS query39 in qtests
> --
>
> Key: HIVE-23742
> URL: https://issues.apache.org/jira/browse/HIVE-23742
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Trivial
>
> TPC-DS queries under clientpositive/perf are meant only to check plan 
> regressions so they should never be really executed thus the execution part 
> should be removed from query39.q and cbo_query39.q



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23741) Store CacheTags in the file cache level

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23741:
--
Labels: pull-request-available  (was: )

> Store CacheTags in the file cache level
> ---
>
> Key: HIVE-23741
> URL: https://issues.apache.org/jira/browse/HIVE-23741
> Project: Hive
>  Issue Type: Improvement
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CacheTags are currently stored for every data buffer. The strings are 
> internalized, but the number of cache tag objects can be reduced by moving 
> them to the file cache level, and back referencing them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23741) Store CacheTags in the file cache level

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23741?focusedWorklogId=449299=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449299
 ]

ASF GitHub Bot logged work on HIVE-23741:
-

Author: ASF GitHub Bot
Created on: 22/Jun/20 14:32
Start Date: 22/Jun/20 14:32
Worklog Time Spent: 10m 
  Work Description: asinkovits opened a new pull request #1159:
URL: https://github.com/apache/hive/pull/1159


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449299)
Remaining Estimate: 0h
Time Spent: 10m

> Store CacheTags in the file cache level
> ---
>
> Key: HIVE-23741
> URL: https://issues.apache.org/jira/browse/HIVE-23741
> Project: Hive
>  Issue Type: Improvement
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CacheTags are currently stored for every data buffer. The strings are 
> internalized, but the number of cache tag objects can be reduced by moving 
> them to the file cache level, and back referencing them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23741) Store CacheTags in the file cache level

2020-06-22 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits updated HIVE-23741:
---
Attachment: (was: HIVE-23741.01.patch)

> Store CacheTags in the file cache level
> ---
>
> Key: HIVE-23741
> URL: https://issues.apache.org/jira/browse/HIVE-23741
> Project: Hive
>  Issue Type: Improvement
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>
> CacheTags are currently stored for every data buffer. The strings are 
> internalized, but the number of cache tag objects can be reduced by moving 
> them to the file cache level, and back referencing them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23741) Store CacheTags in the file cache level

2020-06-22 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits updated HIVE-23741:
---
Status: Open  (was: Patch Available)

> Store CacheTags in the file cache level
> ---
>
> Key: HIVE-23741
> URL: https://issues.apache.org/jira/browse/HIVE-23741
> Project: Hive
>  Issue Type: Improvement
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>
> CacheTags are currently stored for every data buffer. The strings are 
> internalized, but the number of cache tag objects can be reduced by moving 
> them to the file cache level, and back referencing them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23741) Store CacheTags in the file cache level

2020-06-22 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits updated HIVE-23741:
---
Status: Patch Available  (was: Open)

> Store CacheTags in the file cache level
> ---
>
> Key: HIVE-23741
> URL: https://issues.apache.org/jira/browse/HIVE-23741
> Project: Hive
>  Issue Type: Improvement
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
> Attachments: HIVE-23741.01.patch
>
>
> CacheTags are currently stored for every data buffer. The strings are 
> internalized, but the number of cache tag objects can be reduced by moving 
> them to the file cache level, and back referencing them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23741) Store CacheTags in the file cache level

2020-06-22 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits updated HIVE-23741:
---
Attachment: HIVE-23741.01.patch

> Store CacheTags in the file cache level
> ---
>
> Key: HIVE-23741
> URL: https://issues.apache.org/jira/browse/HIVE-23741
> Project: Hive
>  Issue Type: Improvement
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
> Attachments: HIVE-23741.01.patch
>
>
> CacheTags are currently stored for every data buffer. The strings are 
> internalized, but the number of cache tag objects can be reduced by moving 
> them to the file cache level, and back referencing them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23741) Store CacheTags in the file cache level

2020-06-22 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits reassigned HIVE-23741:
--


> Store CacheTags in the file cache level
> ---
>
> Key: HIVE-23741
> URL: https://issues.apache.org/jira/browse/HIVE-23741
> Project: Hive
>  Issue Type: Improvement
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>
> CacheTags are currently stored for every data buffer. The strings are 
> internalized, but the number of cache tag objects can be reduced by moving 
> them to the file cache level, and back referencing them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-20890) ACID: Allow whole table ReadLocks to skip all partition locks

2020-06-22 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-20890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142024#comment-17142024
 ] 

Denys Kuzmenko commented on HIVE-20890:
---

Hi [~pvary]. I can create pull request if necessary. In regards to 1st 
question, we are going to lock whole table if number of partitions we want to 
acquire lock on exceeds the threshold. In this case we are not locking 
partitions only table.

> ACID: Allow whole table ReadLocks to skip all partition locks
> -
>
> Key: HIVE-20890
> URL: https://issues.apache.org/jira/browse/HIVE-20890
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Gopal Vijayaraghavan
>Assignee: Denys Kuzmenko
>Priority: Major
> Attachments: HIVE-20890.1.patch, HIVE-20890.2.patch, 
> HIVE-20890.3.patch, HIVE-20890.4.patch
>
>
> HIVE-19369 proposes adding a EXCL_WRITE lock which does not wait for any 
> SHARED_READ locks for read operations - in the presence of that lock, the 
> insert overwrite no longer takes an exclusive lock.
> The only exclusive operation will be a schema change or drop table, which 
> should take an exclusive lock on the entire table directly.
> {code}
> explain locks select * from tpcds_bin_partitioned_orc_1000.store_sales where 
> ss_sold_date_sk=2452626 
> ++
> |  Explain   |
> ++
> | LOCK INFORMATION:  |
> | tpcds_bin_partitioned_orc_1000.store_sales -> SHARED_READ |
> | tpcds_bin_partitioned_orc_1000.store_sales.ss_sold_date_sk=2452626 -> 
> SHARED_READ |
> ++
> {code}
> So the per-partition SHARED_READ locks are no longer necessary, if the lock 
> builder already includes the table-wide SHARED_READ locks.
> The removal of entire partitions is the only part which needs to be taken 
> care of within this semantics as row-removal instead of directory removal 
> (i.e "drop partition" -> "truncate partition" and have the truncation trigger 
> a whole directory cleaner, so that the partition disappears when there are 0 
> rows left).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23730) Compiler support tracking TS keyColName for Probe MapJoin

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23730?focusedWorklogId=449159=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449159
 ]

ASF GitHub Bot logged work on HIVE-23730:
-

Author: ASF GitHub Bot
Created on: 22/Jun/20 11:05
Start Date: 22/Jun/20 11:05
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1152:
URL: https://github.com/apache/hive/pull/1152#discussion_r443482036



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
##
@@ -1566,13 +1569,38 @@ private void 
removeSemijoinsParallelToMapJoin(OptimizeTezProcContext procCtx)
 
   List keyDesc = 
selectedMJOp.getConf().getKeys().get(posBigTable);
   ExprNodeColumnDesc keyCol = (ExprNodeColumnDesc) keyDesc.get(0);
-
-  tsProbeDecodeCtx = new TableScanOperator.ProbeDecodeContext(mjCacheKey, 
mjSmallTablePos,
-  keyCol.getColumn(), selectedMJOpRatio);
+  String realTSColName = getOriginalTSColName(selectedMJOp, 
keyCol.getColumn());
+  if (realTSColName != null) {
+tsProbeDecodeCtx = new 
TableScanOperator.ProbeDecodeContext(mjCacheKey, mjSmallTablePos,
+realTSColName, selectedMJOpRatio);
+  } else {
+LOG.warn("ProbeDecode could not find TSColName for ColKey {} with MJ 
Schema {} ", keyCol, selectedMJOp.getSchema());

Review comment:
   Hey @jcamachor , thanks for the comments!
   The HIVE_IN_TEST trick could work as long we enable probedecode optimisation 
by default right? (currently if off)
   
   Just enabled the optimisation for this PR (throwing an exception instead of 
warn) to identify any existing issues.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449159)
Time Spent: 40m  (was: 0.5h)

> Compiler support tracking TS keyColName for Probe MapJoin
> -
>
> Key: HIVE-23730
> URL: https://issues.apache.org/jira/browse/HIVE-23730
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Compiler needs to track the original TS key columnName used for MJ 
> probedecode.
> Even though we know the MJ keyCol at compile time, this could be generated by 
> previous (parent) operators thus we dont always know the original TS column 
> it maps to.
> To find the original columnMapping, we need to track the MJ keyCol through 
> the operator pipeline. Tracking can be done through the parent operator 
> ColumnExprMap and RowSchema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23730) Compiler support tracking TS keyColName for Probe MapJoin

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23730?focusedWorklogId=449156=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449156
 ]

ASF GitHub Bot logged work on HIVE-23730:
-

Author: ASF GitHub Bot
Created on: 22/Jun/20 11:00
Start Date: 22/Jun/20 11:00
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1152:
URL: https://github.com/apache/hive/pull/1152#discussion_r443479460



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
##
@@ -1566,13 +1569,38 @@ private void 
removeSemijoinsParallelToMapJoin(OptimizeTezProcContext procCtx)
 
   List keyDesc = 
selectedMJOp.getConf().getKeys().get(posBigTable);
   ExprNodeColumnDesc keyCol = (ExprNodeColumnDesc) keyDesc.get(0);
-
-  tsProbeDecodeCtx = new TableScanOperator.ProbeDecodeContext(mjCacheKey, 
mjSmallTablePos,
-  keyCol.getColumn(), selectedMJOpRatio);
+  String realTSColName = getOriginalTSColName(selectedMJOp, 
keyCol.getColumn());
+  if (realTSColName != null) {
+tsProbeDecodeCtx = new 
TableScanOperator.ProbeDecodeContext(mjCacheKey, mjSmallTablePos,
+realTSColName, selectedMJOpRatio);
+  } else {
+LOG.warn("ProbeDecode could not find TSColName for ColKey {} with MJ 
Schema {} ", keyCol, selectedMJOp.getSchema());
+  }
 }
 return tsProbeDecodeCtx;
   }
 
+  private static String getOriginalTSColName(MapJoinOperator mjOp, String 
internalCoName) {

Review comment:
   Utility method is now moved to operatorUtils 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449156)
Time Spent: 0.5h  (was: 20m)

> Compiler support tracking TS keyColName for Probe MapJoin
> -
>
> Key: HIVE-23730
> URL: https://issues.apache.org/jira/browse/HIVE-23730
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Compiler needs to track the original TS key columnName used for MJ 
> probedecode.
> Even though we know the MJ keyCol at compile time, this could be generated by 
> previous (parent) operators thus we dont always know the original TS column 
> it maps to.
> To find the original columnMapping, we need to track the MJ keyCol through 
> the operator pipeline. Tracking can be done through the parent operator 
> ColumnExprMap and RowSchema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23706) Fix nulls first sorting behavior

2020-06-22 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-23706.
---
Resolution: Fixed

> Fix nulls first sorting behavior
> 
>
> Key: HIVE-23706
> URL: https://issues.apache.org/jira/browse/HIVE-23706
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {code}
> INSERT INTO t(a) VALUES (1), (null), (3), (2), (2), (2);
> select a from t order by a desc;
> {code}
> instead of 
> {code}
> 3, 2, 2, 2, 1, null
> {code}
> should return 
> {code}
> null, 3, 2 ,2 ,2, 1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23706) Fix nulls first sorting behavior

2020-06-22 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141913#comment-17141913
 ] 

Krisztian Kasa commented on HIVE-23706:
---

Pushed to master. Thank you [~jcamachorodriguez] and [~zabetak] for review.

> Fix nulls first sorting behavior
> 
>
> Key: HIVE-23706
> URL: https://issues.apache.org/jira/browse/HIVE-23706
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {code}
> INSERT INTO t(a) VALUES (1), (null), (3), (2), (2), (2);
> select a from t order by a desc;
> {code}
> instead of 
> {code}
> 3, 2, 2, 2, 1, null
> {code}
> should return 
> {code}
> null, 3, 2 ,2 ,2, 1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23706) Fix nulls first sorting behavior

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23706?focusedWorklogId=449154=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449154
 ]

ASF GitHub Bot logged work on HIVE-23706:
-

Author: ASF GitHub Bot
Created on: 22/Jun/20 10:51
Start Date: 22/Jun/20 10:51
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged pull request #1131:
URL: https://github.com/apache/hive/pull/1131


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449154)
Time Spent: 1h  (was: 50m)

> Fix nulls first sorting behavior
> 
>
> Key: HIVE-23706
> URL: https://issues.apache.org/jira/browse/HIVE-23706
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {code}
> INSERT INTO t(a) VALUES (1), (null), (3), (2), (2), (2);
> select a from t order by a desc;
> {code}
> instead of 
> {code}
> 3, 2, 2, 2, 1, null
> {code}
> should return 
> {code}
> null, 3, 2 ,2 ,2, 1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23738) DBLockManager::lock() : Move lock request to debug level

2020-06-22 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141902#comment-17141902
 ] 

Peter Vary commented on HIVE-23738:
---

Or partially. The lock components should be only in the debug logs, but it 
might be good to have info about the lock request starting


> DBLockManager::lock() : Move lock request to debug level
> 
>
> Key: HIVE-23738
> URL: https://issues.apache.org/jira/browse/HIVE-23738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Trivial
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java#L102]
>  
> For Q78 @30TB scale, it ends up dumping couple of MBs of log in info level to 
> print the lock request type. If possible, this should be moved to debug level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23611) Mandate fully qualified absolute path for external table base dir during REPL operation

2020-06-22 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23611:

Attachment: HIVE-23611.02.patch

> Mandate fully qualified absolute path for external table base dir during REPL 
> operation
> ---
>
> Key: HIVE-23611
> URL: https://issues.apache.org/jira/browse/HIVE-23611
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23611.01.patch, HIVE-23611.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23703) Major QB compaction with multiple FileSinkOperators results in data loss and one original file

2020-06-22 Thread Marta Kuczora (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141858#comment-17141858
 ] 

Marta Kuczora commented on HIVE-23703:
--

+1
Thanks a lot [~klcopp] for the patch!

> Major QB compaction with multiple FileSinkOperators results in data loss and 
> one original file
> --
>
> Key: HIVE-23703
> URL: https://issues.apache.org/jira/browse/HIVE-23703
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Critical
>  Labels: compaction, pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> h4. Problems
> Example:
> {code:java}
> drop table if exists tbl2;
> create transactional table tbl2 (a int, b int) clustered by (a) into 4 
> buckets stored as ORC 
> TBLPROPERTIES('transactional'='true','transactional_properties'='default');
> insert into tbl2 values(1,2),(1,3),(1,4),(2,2),(2,3),(2,4);
> insert into tbl2 values(3,2),(3,3),(3,4),(4,2),(4,3),(4,4);
> insert into tbl2 values(5,2),(5,3),(5,4),(6,2),(6,3),(6,4);{code}
> E.g. in the example above, bucketId=0 when a=2 and a=6.
> 1. Data loss 
>  In non-acid tables, an operator's temp files are named with their task id. 
> Because of this snippet, temp files in the FileSinkOperator for compaction 
> tables are identified by their bucket_id.
> {code:java}
> if (conf.isCompactionTable()) {
>  fsp.initializeBucketPaths(filesIdx, AcidUtils.BUCKET_PREFIX + 
> String.format(AcidUtils.BUCKET_DIGITS, bucketId),
>  isNativeTable(), isSkewedStoredAsSubDirectories);
>  } else {
>  fsp.initializeBucketPaths(filesIdx, taskId, isNativeTable(), 
> isSkewedStoredAsSubDirectories);
>  }
> {code}
> So 2 temp files containing data with a=2 and a=6 will be named bucket_0 and 
> not 00_0 and 00_1 as they would normally.
>  In FileSinkOperator.commit, when data with a=2, filename: bucket_0 is moved 
> from _task_tmp.-ext-10002 to _tmp.-ext-10002, it overwrites the files already 
> there with a=6 data, because it too is named bucket_0. You can see in the 
> logs:
> {code:java}
>  WARN [LocalJobRunner Map Task Executor #0] exec.FileSinkOperator: Target 
> path 
> file:.../hive/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnNoBuckets-1591107230237/warehouse/testmajorcompaction/base_002_v013/.hive-staging_hive_2020-06-02_07-15-21_771_8551447285061957908-1/_tmp.-ext-10002/bucket_0
>  with a size 610 exists. Trying to delete it.
> {code}
> 2. Results in one original file
>  OrcFileMergeOperator merges the results of the FSOp into 1 file named 
> 00_0.
> h4. Fix
> 1. FSOp will store data as: taskid/bucketId. e.g. 0_0/bucket_0
> 2. OrcMergeFileOp, instead of merging a bunch of files into 1 file named 
> 00_0, will merge all files named bucket_0 into one file named bucket_0, 
> and so on.
> 3. MoveTask will get rid of the taskId directories if present and only move 
> the bucket files in them, in case OrcMergeFileOp is not run.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23668) Clean up Task for Hive Metrics

2020-06-22 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23668:
---
Attachment: HIVE-23668.05.patch
Status: Patch Available  (was: In Progress)

> Clean up Task for Hive Metrics
> --
>
> Key: HIVE-23668
> URL: https://issues.apache.org/jira/browse/HIVE-23668
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23668.01.patch, HIVE-23668.02.patch, 
> HIVE-23668.03.patch, HIVE-23668.04.patch, HIVE-23668.05.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23668) Clean up Task for Hive Metrics

2020-06-22 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23668:
---
Status: In Progress  (was: Patch Available)

> Clean up Task for Hive Metrics
> --
>
> Key: HIVE-23668
> URL: https://issues.apache.org/jira/browse/HIVE-23668
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23668.01.patch, HIVE-23668.02.patch, 
> HIVE-23668.03.patch, HIVE-23668.04.patch, HIVE-23668.05.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-06-22 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23737:
-
Environment: (was: *strong text*)

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-06-22 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23737:
-
Component/s: llap

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-06-22 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141814#comment-17141814
 ] 

Syed Shameerur Rahman commented on HIVE-23737:
--

[~gopalv] [~prasanth_j] Any thoughts on this ?

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
> Environment: *strong text*
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-06-22 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman reassigned HIVE-23737:



> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
> Environment: *strong text*
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23736) Disable topn in ReduceSinkOp if a TNK is introduced

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23736?focusedWorklogId=449068=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449068
 ]

ASF GitHub Bot logged work on HIVE-23736:
-

Author: ASF GitHub Bot
Created on: 22/Jun/20 08:15
Start Date: 22/Jun/20 08:15
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #1158:
URL: https://github.com/apache/hive/pull/1158


   Testing done:
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests 
-Dtest=TestMiniLlapLocalCliDriver 
-Dqfile=vector_topnkey.q,topnkey_grouping_sets_order.q,topnkey_windowing_order.q,topnkey_grouping_sets_functions.q,topnkey_order_null.q,topnkey_grouping_sets.q,topnkey_windowing.q
 -pl itests/qtest -Pitests
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449068)
Remaining Estimate: 0h
Time Spent: 10m

> Disable topn in ReduceSinkOp if a TNK is introduced
> ---
>
> Key: HIVE-23736
> URL: https://issues.apache.org/jira/browse/HIVE-23736
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Both the Reduce Sink and TopNKey operator has Top n key filtering 
> functionality. If TNK is introduced this functionality is done twice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23736) Disable topn in ReduceSinkOp if a TNK is introduced

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23736:
--
Labels: pull-request-available  (was: )

> Disable topn in ReduceSinkOp if a TNK is introduced
> ---
>
> Key: HIVE-23736
> URL: https://issues.apache.org/jira/browse/HIVE-23736
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Both the Reduce Sink and TopNKey operator has Top n key filtering 
> functionality. If TNK is introduced this functionality is done twice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23736) Disable topn in ReduceSinkOp if a TNK is introduced

2020-06-22 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-23736:
-


> Disable topn in ReduceSinkOp if a TNK is introduced
> ---
>
> Key: HIVE-23736
> URL: https://issues.apache.org/jira/browse/HIVE-23736
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Minor
>
> Both the Reduce Sink and TopNKey operator has Top n key filtering 
> functionality. If TNK is introduced this functionality is done twice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)