[GitHub] [hive] kgyrtkirk commented on a change in pull request #1341: HIVE-23959 bulk stat removal

2020-07-30 Thread GitBox


kgyrtkirk commented on a change in pull request #1341:
URL: https://github.com/apache/hive/pull/1341#discussion_r463435574



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -9483,6 +9483,81 @@ private void dropPartitionColumnStatisticsNoTxn(
 queryWithParams.getLeft().closeAll();
   }
 
+  @Override
+  public void deleteAllPartitionColumnStatistics(TableName tn, String 
writeIdList) {
+
+String catName = tn.getCat();
+String dbName = tn.getDb();
+String tableName = tn.getTable();
+
+Query query = null;
+dbName = org.apache.commons.lang3.StringUtils.defaultString(dbName, 
Warehouse.DEFAULT_DATABASE_NAME);
+catName = normalizeIdentifier(catName);
+if (tableName == null) {
+  throw new RuntimeException("Table name is null.");
+}
+boolean ret = false;
+try {
+  openTransaction();
+  MTable mTable = getMTable(catName, dbName, tableName);
+
+  query = pm.newQuery(MPartitionColumnStatistics.class);
+
+  Object engine = null;

Review comment:
   oh..sorry; I plan to disable this option by default - and leave only a 
targeted test to cover for it
   but I wanted to make a sanity check that this will work in case all tests 
are executed (and it does)
   
   I should have marked it as wip or something..I'll clean up things like this 
in the next version. 
   
   thank you for taking a look!  I was testing with psql so far but  I'll check 
mysql as well!
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kasakrisz commented on a change in pull request #1324: HIVE-23939: SharedWorkOptimizer: take the union of columns in mergeable TableScans

2020-07-30 Thread GitBox


kasakrisz commented on a change in pull request #1324:
URL: https://github.com/apache/hive/pull/1324#discussion_r463430942



##
File path: 
ql/src/test/results/clientpositive/llap/auto_join_reordering_values.q.out
##
@@ -144,122 +144,30 @@ STAGE PLANS:
 tag: 0
 value expressions: _col0 (type: int), _col2 (type: 
int), _col3 (type: int)
 auto parallelism: true
-Execution mode: vectorized, llap
-LLAP IO: no inputs
-Path -> Alias:
- A masked pattern was here 
-Path -> Partition:
- A masked pattern was here 
-Partition
-  base file name: orderpayment_small
-  input format: org.apache.hadoop.mapred.TextInputFormat
-  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
-  properties:
-bucket_count -1
-bucketing_version 2
-column.name.delimiter ,
-columns dealid,date,time,cityid,userid
-columns.types int:string:string:int:int
- A masked pattern was here 
-name default.orderpayment_small
-serialization.format 1
-serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
-  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
-
-input format: org.apache.hadoop.mapred.TextInputFormat
-output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
-properties:
-  bucketing_version 2
-  column.name.delimiter ,
-  columns dealid,date,time,cityid,userid
-  columns.comments 
-  columns.types int:string:string:int:int
- A masked pattern was here 
-  name default.orderpayment_small
-  serialization.format 1
-  serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
-serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
-name: default.orderpayment_small
-  name: default.orderpayment_small
-Truncated Path -> Alias:
-  /orderpayment_small [orderpayment]
-Map 6 
-Map Operator Tree:
-TableScan
-  alias: dim_pay_date
-  filterExpr: date is not null (type: boolean)
-  Statistics: Num rows: 1 Data size: 94 Basic stats: COMPLETE 
Column stats: COMPLETE
-  GatherStats: false
   Filter Operator
 isSamplingPred: false
-predicate: date is not null (type: boolean)
-Statistics: Num rows: 1 Data size: 94 Basic stats: 
COMPLETE Column stats: COMPLETE
+predicate: dealid is not null (type: boolean)
+Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: COMPLETE
 Select Operator
-  expressions: date (type: string)

Review comment:
   The TS pulls the column `date` only was not merged due to 
`validPreConditions` failed:
   ```
   2020-07-30T23:20:32,796 DEBUG [a51ac124-5fb2-43bc-a0fc-f27099762584 main] 
optimizer.SharedWorkOptimizer: After SharedWorkSJOptimizer:
   
TS[0]-FIL[44]-SEL[2]-RS[15]-MERGEJOIN[89]-RS[18]-MERGEJOIN[90]-RS[21]-MERGEJOIN[91]-RS[24]-MERGEJOIN[92]-SEL[27]-LIM[28]-FS[29]
   TS[3]-FIL[45]-SEL[5]-RS[16]-MERGEJOIN[89]
   TS[6]-FIL[46]-SEL[8]-RS[19]-MERGEJOIN[90]
   TS[9]-FIL[47]-SEL[11]-RS[22]-MERGEJOIN[91]
   TS[12]-FIL[48]-SEL[14]-RS[25]-MERGEJOIN[92]
   ```
   
   Both has the same output works:
   ```
   TS[0] 
   alias = "orderpayment"
   dbName = "default"
   tableName = "orderpayment_small"
   neededColumns = {ArrayList@24085}  size = 4
0 = "dealid"
1 = "date"
2 = "cityid"
3 = "userid"
   outputWorksOps1 = {HashSet@24017}  size = 2
0 = {ReduceSinkOperator@24028} "RS[18]"
1 = {CommonMergeJoinOperator@24029} "MERGEJOIN[89]"
   ```
   ```
   TS[3]
   alias = "dim_pay_date"
   dbName = "default"
   tableName = "orderpayment_small"
   neededColumns = {ArrayList@23791}  size = 1
0 = "date"
   outputWorksOps2 = {HashSet@24022}  size = 2
0 = {ReduceSinkOperator@24028} "RS[18]"
1 = {CommonMergeJoinOperator@24029} "MERGEJOIN[89]"
   ```
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hive] kasakrisz commented on a change in pull request #1324: HIVE-23939: SharedWorkOptimizer: take the union of columns in mergeable TableScans

2020-07-30 Thread GitBox


kasakrisz commented on a change in pull request #1324:
URL: https://github.com/apache/hive/pull/1324#discussion_r463415834



##
File path: 
ql/src/test/results/clientpositive/llap/annotate_stats_join_pkfk.q.out
##
@@ -1191,14 +1191,6 @@ STAGE PLANS:
 sort order: +
 Map-reduce partition columns: _col0 (type: int)
 Statistics: Num rows: 12 Data size: 48 Basic stats: 
COMPLETE Column stats: COMPLETE
-Execution mode: vectorized, llap

Review comment:
   I checked this with the debugger:
   ```
   TS[6]
   dbName = "default"
   tableName = "store_n0"
   neededColumns = {ArrayList@24024}  size = 1
0 = "s_store_sk"
   
   TS[3]
   dbName = "default"
   tableName = "store_n0"
   neededColumns = {ArrayList@24053}  size = 2
0 = "s_store_sk"
1 = "s_floor_space"
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvary commented on a change in pull request #1339: HIVE-23956: Delete delta fileIds should be pushed execution

2020-07-30 Thread GitBox


pvary commented on a change in pull request #1339:
URL: https://github.com/apache/hive/pull/1339#discussion_r463406580



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -1574,45 +1576,46 @@ public int compareTo(CompressedOwid other) {
   this.orcSplit = orcSplit;
 
   try {
-final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
-if (deleteDeltaDirs.length > 0) {
+if (orcSplit.getDeltas().size() > 0) {
   AcidOutputFormat.Options orcSplitMinMaxWriteIds =
   AcidUtils.parseBaseOrDeltaBucketFilename(orcSplit.getPath(), 
conf);
   int totalDeleteEventCount = 0;
-  for (Path deleteDeltaDir : deleteDeltaDirs) {
-if (!isQualifiedDeleteDeltaForSplit(orcSplitMinMaxWriteIds, 
deleteDeltaDir)) {
-  continue;
-}
-Path[] deleteDeltaFiles = 
OrcRawRecordMerger.getDeltaFiles(deleteDeltaDir, bucket,
-new OrcRawRecordMerger.Options().isCompacting(false), null);
-for (Path deleteDeltaFile : deleteDeltaFiles) {
-  try {
-ReaderData readerData = getOrcTail(deleteDeltaFile, conf, 
cacheTag);
-OrcTail orcTail = readerData.orcTail;
-if (orcTail.getFooter().getNumberOfRows() <= 0) {
-  continue; // just a safe check to ensure that we are not 
reading empty delete files.
-}
-OrcRawRecordMerger.KeyInterval deleteKeyInterval = 
findDeleteMinMaxKeys(orcTail, deleteDeltaFile);
-if (!deleteKeyInterval.isIntersects(keyInterval)) {
-  // If there is no intersection between data and delete 
delta, do not read delete file
-  continue;
-}
-// Reader can be reused if it was created before for getting 
orcTail: mostly for non-LLAP cache cases.
-// For LLAP cases we need to create it here.
-Reader deleteDeltaReader = readerData.reader != null ? 
readerData.reader :
-OrcFile.createReader(deleteDeltaFile, 
OrcFile.readerOptions(conf));
-totalDeleteEventCount += deleteDeltaReader.getNumberOfRows();
-DeleteReaderValue deleteReaderValue = new 
DeleteReaderValue(deleteDeltaReader,
-deleteDeltaFile, readerOptions, bucket, validWriteIdList, 
isBucketedTable, conf,
-keyInterval, orcSplit);
-DeleteRecordKey deleteRecordKey = new DeleteRecordKey();
-if (deleteReaderValue.next(deleteRecordKey)) {
-  sortMerger.put(deleteRecordKey, deleteReaderValue);
-} else {
-  deleteReaderValue.close();
+  for (AcidInputFormat.DeltaMetaData deltaMetaData : 
orcSplit.getDeltas()) {
+for (Path deleteDeltaDir : 
deltaMetaData.getPaths(orcSplit.getRootDir())) {
+  if (!isQualifiedDeleteDeltaForSplit(orcSplitMinMaxWriteIds, 
deleteDeltaDir)) {
+LOG.debug("Skipping delete delta dir {}", deleteDeltaDir);
+continue;
+  }
+  for (AcidInputFormat.DeltaFileMetaData fileMetaData : 
deltaMetaData.getDeltaFiles()) {
+Path deleteDeltaFile = fileMetaData.getPath(deleteDeltaDir, 
bucket);
+try {
+  ReaderData readerData = getOrcTail(deleteDeltaFile, conf, 
cacheTag, fileMetaData.getFileId(deleteDeltaDir, bucket));
+  OrcTail orcTail = readerData.orcTail;
+  if (orcTail.getFooter().getNumberOfRows() <= 0) {
+continue; // just a safe check to ensure that we are not 
reading empty delete files.
+  }
+  OrcRawRecordMerger.KeyInterval deleteKeyInterval = 
findDeleteMinMaxKeys(orcTail, deleteDeltaFile);
+  if (!deleteKeyInterval.isIntersects(keyInterval)) {
+// If there is no intersection between data and delete 
delta, do not read delete file
+continue;
+  }
+  // Reader can be reused if it was created before for getting 
orcTail: mostly for non-LLAP cache cases.
+  // For LLAP cases we need to create it here.
+  Reader deleteDeltaReader = readerData.reader != null ? 
readerData.reader : OrcFile
+  .createReader(deleteDeltaFile, 
OrcFile.readerOptions(conf));
+  totalDeleteEventCount += deleteDeltaReader.getNumberOfRows();
+  DeleteReaderValue deleteReaderValue =
+  new DeleteReaderValue(deleteDeltaReader, 
deleteDeltaFile, readerOptions, bucket, validWriteIdList,
+  isBucketedTable, conf, keyInterval, orcSplit);
+  DeleteRecordKey deleteRecordKey = new DeleteRecordKey();
+  if (deleteReaderValue.next(deleteRecordKey)

[GitHub] [hive] pvary commented on a change in pull request #1341: HIVE-23959 bulk stat removal

2020-07-30 Thread GitBox


pvary commented on a change in pull request #1341:
URL: https://github.com/apache/hive/pull/1341#discussion_r463405230



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -9483,6 +9483,81 @@ private void dropPartitionColumnStatisticsNoTxn(
 queryWithParams.getLeft().closeAll();
   }
 
+  @Override
+  public void deleteAllPartitionColumnStatistics(TableName tn, String 
writeIdList) {
+
+String catName = tn.getCat();
+String dbName = tn.getDb();
+String tableName = tn.getTable();
+
+Query query = null;
+dbName = org.apache.commons.lang3.StringUtils.defaultString(dbName, 
Warehouse.DEFAULT_DATABASE_NAME);
+catName = normalizeIdentifier(catName);
+if (tableName == null) {
+  throw new RuntimeException("Table name is null.");
+}
+boolean ret = false;
+try {
+  openTransaction();
+  MTable mTable = getMTable(catName, dbName, tableName);
+
+  query = pm.newQuery(MPartitionColumnStatistics.class);
+
+  Object engine = null;

Review comment:
   This seems strange. Setting this to null and one line below checking





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvary commented on a change in pull request #1341: HIVE-23959 bulk stat removal

2020-07-30 Thread GitBox


pvary commented on a change in pull request #1341:
URL: https://github.com/apache/hive/pull/1341#discussion_r463404741



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
##
@@ -2960,4 +2977,22 @@ public void lockDbTable(String tableName) throws 
MetaException {
   throw new MetaException("Error while locking table " + tableName + ": " 
+ sqle.getMessage());
 }
   }
+
+  public void deleteColumnStatsState(long tbl_id) throws MetaException {
+// @formatter:off
+String queryText = ""
++ "delete from " + PARTITION_PARAMS + " pp"
++ " using " + PARTITIONS + " p "
++ " where "
++ "   p.\"PART_ID = pp.\"PART_ID\""

Review comment:
   Please carefully check every ". Postgres and MySql are very problematic 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvary commented on a change in pull request #1341: HIVE-23959 bulk stat removal

2020-07-30 Thread GitBox


pvary commented on a change in pull request #1341:
URL: https://github.com/apache/hive/pull/1341#discussion_r463404254



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
##
@@ -984,16 +984,23 @@ private boolean isViewTable(String catName, String 
dbName, String tblName) throw
   part.setCatName(catName);
   part.setDbName(dbName);
   part.setTableName(tblName);
-  if (fields[4] != null) 
part.setCreateTime(MetastoreDirectSqlUtils.extractSqlInt(fields[4]));
-  if (fields[5] != null) 
part.setLastAccessTime(MetastoreDirectSqlUtils.extractSqlInt(fields[5]));
+  if (fields[4] != null) {
+part.setCreateTime(MetastoreDirectSqlUtils.extractSqlInt(fields[4]));
+  }
+  if (fields[5] != null) {
+
part.setLastAccessTime(MetastoreDirectSqlUtils.extractSqlInt(fields[5]));
+  }
   Long writeId = MetastoreDirectSqlUtils.extractSqlLong(fields[14]);
   if (writeId != null) {
 part.setWriteId(writeId);
   }
   partitions.put(partitionId, part);
 
 
-  if (sdId == null) continue; // Probably a view.
+  if (sdId == null)

Review comment:
   Nit: formatting 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] aasha opened a new pull request #1345: HIVE-23961 : Enable external table replication by default

2020-07-30 Thread GitBox


aasha opened a new pull request #1345:
URL: https://github.com/apache/hive/pull/1345


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] xiaomengzhang commented on pull request #1344: HIVE-23962 Make bin/hive pick user defined jdbc url

2020-07-30 Thread GitBox


xiaomengzhang commented on pull request #1344:
URL: https://github.com/apache/hive/pull/1344#issuecomment-666894541


   @nrg4878 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] xiaomengzhang opened a new pull request #1344: HIVE-23962 Make bin/hive pick user defined jdbc url

2020-07-30 Thread GitBox


xiaomengzhang opened a new pull request #1344:
URL: https://github.com/apache/hive/pull/1344


   Add an env variable BEELINE_URL_LEGACY, when this value is not empty,
   run "beeline -c $BEELINE_URL_LEGACY".
   
   Change-Id: I9bd7f34612e2bd246d5cba37ddfd8a582038bbdd
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] jcamachor commented on a change in pull request #1324: HIVE-23939: SharedWorkOptimizer: take the union of columns in mergeable TableScans

2020-07-30 Thread GitBox


jcamachor commented on a change in pull request #1324:
URL: https://github.com/apache/hive/pull/1324#discussion_r463040481



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
##
@@ -247,6 +249,18 @@ public ParseContext transform(ParseContext pctx) throws 
SemanticException {
   }
 }
 
+if (LOG.isDebugEnabled()) {

Review comment:
   Can we move the new call before the 
`if(pctx.getConf().getBoolVar(ConfVars.HIVE_SHARED_WORK_REUSE_MAPJOIN_CACHE)) 
{` block? It makes sense to trigger that block at the very end in case we 
continue adding phases.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
##
@@ -247,6 +249,18 @@ public ParseContext transform(ParseContext pctx) throws 
SemanticException {
   }
 }
 
+if (LOG.isDebugEnabled()) {
+  LOG.debug("Before SharedWorkOptimizer #2:\n" + 
Operator.toString(pctx.getTopOps().values()));
+}
+
+// Execute shared work optimization
+new BaseSharedWorkOptimizer().sharedWorkOptimization(

Review comment:
   Can we put this additional step under a new flag (`true` by default)?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
##
@@ -273,258 +287,332 @@ public ParseContext transform(ParseContext pctx) throws 
SemanticException {
 return pctx;
   }
 
-  private static boolean sharedWorkOptimization(ParseContext pctx, 
SharedWorkOptimizerCache optimizerCache,
-  ArrayListMultimap tableNameToOps, 
List> sortedTables,
-  boolean removeSemijoin) throws SemanticException {
-// Boolean to keep track of whether this method actually merged any TS 
operators
-boolean mergedExecuted = false;
-
-Multimap existingOps = 
ArrayListMultimap.create();
-Set> removedOps = new HashSet<>();
-for (Entry tablePair : sortedTables) {
-  String tableName = tablePair.getKey();
-  for (TableScanOperator discardableTsOp : tableNameToOps.get(tableName)) {
-if (removedOps.contains(discardableTsOp)) {
-  LOG.debug("Skip {} as it has already been removed", discardableTsOp);
-  continue;
-}
-Collection prevTsOps = existingOps.get(tableName);
-for (TableScanOperator retainableTsOp : prevTsOps) {
-  if (removedOps.contains(retainableTsOp)) {
-LOG.debug("Skip {} as it has already been removed", 
retainableTsOp);
+  private static class BaseSharedWorkOptimizer {

Review comment:
   Can we add a clarifying comment to the new internal classes with the 
difference between both of them?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
##
@@ -247,6 +249,18 @@ public ParseContext transform(ParseContext pctx) throws 
SemanticException {
   }
 }
 
+if (LOG.isDebugEnabled()) {
+  LOG.debug("Before SharedWorkOptimizer #2:\n" + 
Operator.toString(pctx.getTopOps().values()));

Review comment:
   This is the same as `After SharedWorkSJOptimizer`, no need to print it 
again.

##
File path: 
ql/src/test/results/clientpositive/llap/annotate_stats_join_pkfk.q.out
##
@@ -1191,14 +1191,6 @@ STAGE PLANS:
 sort order: +
 Map-reduce partition columns: _col0 (type: int)
 Statistics: Num rows: 12 Data size: 48 Basic stats: 
COMPLETE Column stats: COMPLETE
-Execution mode: vectorized, llap

Review comment:
   I do not see projected columns, so it is difficult to confirm whether 
this is because of the schema. Is it?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
##
@@ -247,6 +249,18 @@ public ParseContext transform(ParseContext pctx) throws 
SemanticException {
   }
 }
 
+if (LOG.isDebugEnabled()) {
+  LOG.debug("Before SharedWorkOptimizer #2:\n" + 
Operator.toString(pctx.getTopOps().values()));
+}
+
+// Execute shared work optimization
+new BaseSharedWorkOptimizer().sharedWorkOptimization(
+pctx, optimizerCache, tableNameToOps, sortedTables, false);
+
+if (LOG.isDebugEnabled()) {
+  LOG.debug("After SharedWorkOptimizer #2:\n" + 
Operator.toString(pctx.getTopOps().values()));

Review comment:
   `After SharedWorkOptimizer merging TS schema`?

##
File path: 
ql/src/test/results/clientpositive/llap/auto_join_reordering_values.q.out
##
@@ -144,122 +144,30 @@ STAGE PLANS:
 tag: 0
 value expressions: _col0 (type: int), _col2 (type: 
int), _col3 (type: int)
 auto parallelism: true
-Execution mode: vectorized, llap
-LLAP IO: no inputs
-Path -> Alias:
- A masked pattern was here 
-Path -> Partition:
- A masked pattern was here 
-Partition
-  base file name: order

[GitHub] [hive] jcamachor merged pull request #1310: HIVE-23911: CBO fails when query has distinct in function and having clause

2020-07-30 Thread GitBox


jcamachor merged pull request #1310:
URL: https://github.com/apache/hive/pull/1310


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] dengzhhu653 edited a comment on pull request #1322: HIVE-23893/HIVE-14661,split filter into deterministic filter which will be push down and nondeterministic filter which keeps current

2020-07-30 Thread GitBox


dengzhhu653 edited a comment on pull request #1322:
URL: https://github.com/apache/hive/pull/1322#issuecomment-666864106


   ... you can give a view on "Understanding Hive Branches" on the above link, 
and add some tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] dengzhhu653 commented on pull request #1322: HIVE-23893/HIVE-14661,split filter into deterministic filter which will be push down and nondeterministic filter which keeps current positi

2020-07-30 Thread GitBox


dengzhhu653 commented on pull request #1322:
URL: https://github.com/apache/hive/pull/1322#issuecomment-666864106


   ... you can give a view on "Understanding Hive Branches" on the above link



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] jcamachor commented on a change in pull request #1326: HIVE-23941: Refactor TypeCheckProcFactory to be database agnostic

2020-07-30 Thread GitBox


jcamachor commented on a change in pull request #1326:
URL: https://github.com/apache/hive/pull/1326#discussion_r463353432



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java
##
@@ -868,17 +849,15 @@ protected T getXpathOrFuncExprNodeDesc(ASTNode node,
   String funcText = getFunctionText(node, isFunction);
   T expr;
   if (funcText.equals(".")) {
-// "." : FIELD Expression
-
 assert (children.size() == 2);
 // Only allow constant field name for now
 assert (exprFactory.isConstantExpr(children.get(1)));
 T object = children.get(0);
-
+   

Review comment:
   nit. whitespace (there are a few below too)

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java
##
@@ -1063,19 +1007,20 @@ protected T getXpathOrFuncExprNodeDesc(ASTNode node,
   if (numEntries == 1) {
 children.addAll(expressions.asMap().values().iterator().next());
 funcText = "in";
-genericUDF = new GenericUDFIn();
+fi = exprFactory.getFunctionInfo("in");
   } else {
+FunctionInfo inFunctionInfo  = exprFactory.getFunctionInfo("in");
 for (Collection c : expressions.asMap().values()) {
-  newExprs.add(
-  exprFactory.createFuncCallExpr(
-  new GenericUDFIn(), "in", (List) c));
+  newExprs.add(exprFactory.createFuncCallExpr(null, inFunctionInfo,
+ "in", (List) c));
 }
 children.addAll(newExprs);
 funcText = "or";
-genericUDF = new GenericUDFOPOr();
+fi = exprFactory.getFunctionInfo("or");
+   functionInfoChangedFromIn = true;

Review comment:
   nit. indentation off here too

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/type/HiveFunctionHelper.java
##
@@ -545,4 +551,67 @@ public RexNode foldExpression(RexNode expr) {
 return result.get(0);
   }
 
+  /**
+   * {@inheritDoc}
+   */
+  @Override
+  public boolean isAndFunction(FunctionInfo fi) {
+return fi.getGenericUDF() instanceof GenericUDFOPAnd;
+  }
+
+  /**
+   * {@inheritDoc}
+   */
+  @Override
+  public boolean isOrFunction(FunctionInfo fi) {
+return fi.getGenericUDF() instanceof GenericUDFOPOr;
+  }
+
+  /**
+   * {@inheritDoc}
+   */
+  @Override
+  public boolean isInFunction(FunctionInfo fi) {
+return fi.getGenericUDF() instanceof GenericUDFIn;
+  }
+
+  /**
+   * {@inheritDoc}
+   */
+  @Override
+  public boolean isCompareFunction(FunctionInfo fi) {
+return fi.getGenericUDF() instanceof GenericUDFBaseCompare;
+  }
+
+  /**
+   * {@inheritDoc}
+   */
+  @Override
+  public boolean isEqualFunction(FunctionInfo fi) {
+return fi.getGenericUDF() instanceof GenericUDFOPEqual
+&& !(fi.getGenericUDF() instanceof GenericUDFOPEqualNS);
+  }
+
+  /**
+   * {@inheritDoc}
+   */
+  @Override
+  public boolean isConsistentWithinQuery(FunctionInfo fi) {
+//TODO: don't getGenericUDF
+return FunctionRegistry.isConsistentWithinQuery(fi.getGenericUDF());
+  }
+
+  /**
+   * {@inheritDoc}
+   */
+  @Override
+  public boolean isStateful(FunctionInfo fi) {
+//TODO: don't getGenericUDF

Review comment:
   This TODO? Is there a follow-up?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/type/FunctionHelper.java
##
@@ -82,6 +82,20 @@ default RexNode foldExpression(RexNode expr) {
 return expr;
   }
 
+  boolean isAndFunction(FunctionInfo fi);

Review comment:
   Would you mind to add a short comment to all these methods, even if may 
seem obvious for some of them?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/HiveFunctionInfo.java
##
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
+import org.apache.hadoop.hive.ql.udf.ptf.TableFunctionResolver;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * HiveFunctionInfo.

Revie

[GitHub] [hive] pkumarsinha opened a new pull request #1343: HIVE-23960 : Partition with no column statistics leads to unbalanced …

2020-07-30 Thread GitBox


pkumarsinha opened a new pull request #1343:
URL: https://github.com/apache/hive/pull/1343


   …calls to openTransaction/commitTransaction error during 
get_partitions_by_names
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvargacl commented on a change in pull request #1339: HIVE-23956: Delete delta fileIds should be pushed execution

2020-07-30 Thread GitBox


pvargacl commented on a change in pull request #1339:
URL: https://github.com/apache/hive/pull/1339#discussion_r463249932



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -1574,45 +1576,46 @@ public int compareTo(CompressedOwid other) {
   this.orcSplit = orcSplit;
 
   try {
-final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
-if (deleteDeltaDirs.length > 0) {
+if (orcSplit.getDeltas().size() > 0) {
   AcidOutputFormat.Options orcSplitMinMaxWriteIds =
   AcidUtils.parseBaseOrDeltaBucketFilename(orcSplit.getPath(), 
conf);
   int totalDeleteEventCount = 0;
-  for (Path deleteDeltaDir : deleteDeltaDirs) {
-if (!isQualifiedDeleteDeltaForSplit(orcSplitMinMaxWriteIds, 
deleteDeltaDir)) {
-  continue;
-}
-Path[] deleteDeltaFiles = 
OrcRawRecordMerger.getDeltaFiles(deleteDeltaDir, bucket,
-new OrcRawRecordMerger.Options().isCompacting(false), null);
-for (Path deleteDeltaFile : deleteDeltaFiles) {
-  try {
-ReaderData readerData = getOrcTail(deleteDeltaFile, conf, 
cacheTag);
-OrcTail orcTail = readerData.orcTail;
-if (orcTail.getFooter().getNumberOfRows() <= 0) {
-  continue; // just a safe check to ensure that we are not 
reading empty delete files.
-}
-OrcRawRecordMerger.KeyInterval deleteKeyInterval = 
findDeleteMinMaxKeys(orcTail, deleteDeltaFile);
-if (!deleteKeyInterval.isIntersects(keyInterval)) {
-  // If there is no intersection between data and delete 
delta, do not read delete file
-  continue;
-}
-// Reader can be reused if it was created before for getting 
orcTail: mostly for non-LLAP cache cases.
-// For LLAP cases we need to create it here.
-Reader deleteDeltaReader = readerData.reader != null ? 
readerData.reader :
-OrcFile.createReader(deleteDeltaFile, 
OrcFile.readerOptions(conf));
-totalDeleteEventCount += deleteDeltaReader.getNumberOfRows();
-DeleteReaderValue deleteReaderValue = new 
DeleteReaderValue(deleteDeltaReader,
-deleteDeltaFile, readerOptions, bucket, validWriteIdList, 
isBucketedTable, conf,
-keyInterval, orcSplit);
-DeleteRecordKey deleteRecordKey = new DeleteRecordKey();
-if (deleteReaderValue.next(deleteRecordKey)) {
-  sortMerger.put(deleteRecordKey, deleteReaderValue);
-} else {
-  deleteReaderValue.close();
+  for (AcidInputFormat.DeltaMetaData deltaMetaData : 
orcSplit.getDeltas()) {
+for (Path deleteDeltaDir : 
deltaMetaData.getPaths(orcSplit.getRootDir())) {
+  if (!isQualifiedDeleteDeltaForSplit(orcSplitMinMaxWriteIds, 
deleteDeltaDir)) {
+LOG.debug("Skipping delete delta dir {}", deleteDeltaDir);
+continue;
+  }
+  for (AcidInputFormat.DeltaFileMetaData fileMetaData : 
deltaMetaData.getDeltaFiles()) {
+Path deleteDeltaFile = fileMetaData.getPath(deleteDeltaDir, 
bucket);
+try {
+  ReaderData readerData = getOrcTail(deleteDeltaFile, conf, 
cacheTag, fileMetaData.getFileId(deleteDeltaDir, bucket));
+  OrcTail orcTail = readerData.orcTail;
+  if (orcTail.getFooter().getNumberOfRows() <= 0) {
+continue; // just a safe check to ensure that we are not 
reading empty delete files.
+  }
+  OrcRawRecordMerger.KeyInterval deleteKeyInterval = 
findDeleteMinMaxKeys(orcTail, deleteDeltaFile);
+  if (!deleteKeyInterval.isIntersects(keyInterval)) {
+// If there is no intersection between data and delete 
delta, do not read delete file
+continue;
+  }
+  // Reader can be reused if it was created before for getting 
orcTail: mostly for non-LLAP cache cases.
+  // For LLAP cases we need to create it here.
+  Reader deleteDeltaReader = readerData.reader != null ? 
readerData.reader : OrcFile
+  .createReader(deleteDeltaFile, 
OrcFile.readerOptions(conf));
+  totalDeleteEventCount += deleteDeltaReader.getNumberOfRows();
+  DeleteReaderValue deleteReaderValue =
+  new DeleteReaderValue(deleteDeltaReader, 
deleteDeltaFile, readerOptions, bucket, validWriteIdList,
+  isBucketedTable, conf, keyInterval, orcSplit);
+  DeleteRecordKey deleteRecordKey = new DeleteRecordKey();
+  if (deleteReaderValue.next(deleteRecordK

[GitHub] [hive] pvargacl commented on a change in pull request #1339: HIVE-23956: Delete delta fileIds should be pushed execution

2020-07-30 Thread GitBox


pvargacl commented on a change in pull request #1339:
URL: https://github.com/apache/hive/pull/1339#discussion_r463237211



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -680,14 +681,15 @@ public void setBaseAndInnerReader(
* @param path The Orc file path we want to get the OrcTail for
* @param conf The Configuration to access LLAP
* @param cacheTag The cacheTag needed to get OrcTail from LLAP IO cache
+   * @param fileKey fileId of the Orc file (either the Long fileId of HDFS or 
the SyntheticFileId)
* @return ReaderData object where the orcTail is not null. Reader can be 
null, but if we had to create
* one we return that as well for further reuse.
*/
-  private static ReaderData getOrcTail(Path path, Configuration conf, CacheTag 
cacheTag) throws IOException {
+  private static ReaderData getOrcTail(Path path, Configuration conf, CacheTag 
cacheTag, Object fileKey) throws IOException {

Review comment:
   I am not sure, we will ever want to use it. without fileId.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvary commented on a change in pull request #1339: HIVE-23956: Delete delta fileIds should be pushed execution

2020-07-30 Thread GitBox


pvary commented on a change in pull request #1339:
URL: https://github.com/apache/hive/pull/1339#discussion_r463171159



##
File path: ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidInputFormat.java
##
@@ -83,6 +94,34 @@ public void testDeltaMetaConstructWithState() throws 
Exception {
 assertThat(deltaMetaData.getStmtIds().get(2), is(99));
   }
 
+  @Test
+  public void testDeltaMetaWithFile() throws Exception {

Review comment:
   Test for all of the different serialization options





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvary commented on a change in pull request #1339: HIVE-23956: Delete delta fileIds should be pushed execution

2020-07-30 Thread GitBox


pvary commented on a change in pull request #1339:
URL: https://github.com/apache/hive/pull/1339#discussion_r463169065



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -1574,45 +1576,46 @@ public int compareTo(CompressedOwid other) {
   this.orcSplit = orcSplit;
 
   try {
-final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
-if (deleteDeltaDirs.length > 0) {
+if (orcSplit.getDeltas().size() > 0) {
   AcidOutputFormat.Options orcSplitMinMaxWriteIds =
   AcidUtils.parseBaseOrDeltaBucketFilename(orcSplit.getPath(), 
conf);
   int totalDeleteEventCount = 0;
-  for (Path deleteDeltaDir : deleteDeltaDirs) {
-if (!isQualifiedDeleteDeltaForSplit(orcSplitMinMaxWriteIds, 
deleteDeltaDir)) {
-  continue;
-}
-Path[] deleteDeltaFiles = 
OrcRawRecordMerger.getDeltaFiles(deleteDeltaDir, bucket,
-new OrcRawRecordMerger.Options().isCompacting(false), null);
-for (Path deleteDeltaFile : deleteDeltaFiles) {
-  try {
-ReaderData readerData = getOrcTail(deleteDeltaFile, conf, 
cacheTag);
-OrcTail orcTail = readerData.orcTail;
-if (orcTail.getFooter().getNumberOfRows() <= 0) {
-  continue; // just a safe check to ensure that we are not 
reading empty delete files.
-}
-OrcRawRecordMerger.KeyInterval deleteKeyInterval = 
findDeleteMinMaxKeys(orcTail, deleteDeltaFile);
-if (!deleteKeyInterval.isIntersects(keyInterval)) {
-  // If there is no intersection between data and delete 
delta, do not read delete file
-  continue;
-}
-// Reader can be reused if it was created before for getting 
orcTail: mostly for non-LLAP cache cases.
-// For LLAP cases we need to create it here.
-Reader deleteDeltaReader = readerData.reader != null ? 
readerData.reader :
-OrcFile.createReader(deleteDeltaFile, 
OrcFile.readerOptions(conf));
-totalDeleteEventCount += deleteDeltaReader.getNumberOfRows();
-DeleteReaderValue deleteReaderValue = new 
DeleteReaderValue(deleteDeltaReader,
-deleteDeltaFile, readerOptions, bucket, validWriteIdList, 
isBucketedTable, conf,
-keyInterval, orcSplit);
-DeleteRecordKey deleteRecordKey = new DeleteRecordKey();
-if (deleteReaderValue.next(deleteRecordKey)) {
-  sortMerger.put(deleteRecordKey, deleteReaderValue);
-} else {
-  deleteReaderValue.close();
+  for (AcidInputFormat.DeltaMetaData deltaMetaData : 
orcSplit.getDeltas()) {
+for (Path deleteDeltaDir : 
deltaMetaData.getPaths(orcSplit.getRootDir())) {
+  if (!isQualifiedDeleteDeltaForSplit(orcSplitMinMaxWriteIds, 
deleteDeltaDir)) {
+LOG.debug("Skipping delete delta dir {}", deleteDeltaDir);
+continue;
+  }
+  for (AcidInputFormat.DeltaFileMetaData fileMetaData : 
deltaMetaData.getDeltaFiles()) {
+Path deleteDeltaFile = fileMetaData.getPath(deleteDeltaDir, 
bucket);
+try {
+  ReaderData readerData = getOrcTail(deleteDeltaFile, conf, 
cacheTag, fileMetaData.getFileId(deleteDeltaDir, bucket));
+  OrcTail orcTail = readerData.orcTail;
+  if (orcTail.getFooter().getNumberOfRows() <= 0) {
+continue; // just a safe check to ensure that we are not 
reading empty delete files.
+  }
+  OrcRawRecordMerger.KeyInterval deleteKeyInterval = 
findDeleteMinMaxKeys(orcTail, deleteDeltaFile);
+  if (!deleteKeyInterval.isIntersects(keyInterval)) {
+// If there is no intersection between data and delete 
delta, do not read delete file
+continue;
+  }
+  // Reader can be reused if it was created before for getting 
orcTail: mostly for non-LLAP cache cases.
+  // For LLAP cases we need to create it here.
+  Reader deleteDeltaReader = readerData.reader != null ? 
readerData.reader : OrcFile
+  .createReader(deleteDeltaFile, 
OrcFile.readerOptions(conf));
+  totalDeleteEventCount += deleteDeltaReader.getNumberOfRows();
+  DeleteReaderValue deleteReaderValue =
+  new DeleteReaderValue(deleteDeltaReader, 
deleteDeltaFile, readerOptions, bucket, validWriteIdList,
+  isBucketedTable, conf, keyInterval, orcSplit);
+  DeleteRecordKey deleteRecordKey = new DeleteRecordKey();
+  if (deleteReaderValue.next(deleteRecordKey)

[GitHub] [hive] pvary commented on a change in pull request #1339: HIVE-23956: Delete delta fileIds should be pushed execution

2020-07-30 Thread GitBox


pvary commented on a change in pull request #1339:
URL: https://github.com/apache/hive/pull/1339#discussion_r463168246



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -1574,45 +1576,46 @@ public int compareTo(CompressedOwid other) {
   this.orcSplit = orcSplit;
 
   try {
-final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
-if (deleteDeltaDirs.length > 0) {
+if (orcSplit.getDeltas().size() > 0) {
   AcidOutputFormat.Options orcSplitMinMaxWriteIds =
   AcidUtils.parseBaseOrDeltaBucketFilename(orcSplit.getPath(), 
conf);
   int totalDeleteEventCount = 0;
-  for (Path deleteDeltaDir : deleteDeltaDirs) {
-if (!isQualifiedDeleteDeltaForSplit(orcSplitMinMaxWriteIds, 
deleteDeltaDir)) {
-  continue;
-}
-Path[] deleteDeltaFiles = 
OrcRawRecordMerger.getDeltaFiles(deleteDeltaDir, bucket,
-new OrcRawRecordMerger.Options().isCompacting(false), null);
-for (Path deleteDeltaFile : deleteDeltaFiles) {
-  try {
-ReaderData readerData = getOrcTail(deleteDeltaFile, conf, 
cacheTag);
-OrcTail orcTail = readerData.orcTail;
-if (orcTail.getFooter().getNumberOfRows() <= 0) {
-  continue; // just a safe check to ensure that we are not 
reading empty delete files.
-}
-OrcRawRecordMerger.KeyInterval deleteKeyInterval = 
findDeleteMinMaxKeys(orcTail, deleteDeltaFile);
-if (!deleteKeyInterval.isIntersects(keyInterval)) {
-  // If there is no intersection between data and delete 
delta, do not read delete file
-  continue;
-}
-// Reader can be reused if it was created before for getting 
orcTail: mostly for non-LLAP cache cases.
-// For LLAP cases we need to create it here.
-Reader deleteDeltaReader = readerData.reader != null ? 
readerData.reader :
-OrcFile.createReader(deleteDeltaFile, 
OrcFile.readerOptions(conf));
-totalDeleteEventCount += deleteDeltaReader.getNumberOfRows();
-DeleteReaderValue deleteReaderValue = new 
DeleteReaderValue(deleteDeltaReader,
-deleteDeltaFile, readerOptions, bucket, validWriteIdList, 
isBucketedTable, conf,
-keyInterval, orcSplit);
-DeleteRecordKey deleteRecordKey = new DeleteRecordKey();
-if (deleteReaderValue.next(deleteRecordKey)) {
-  sortMerger.put(deleteRecordKey, deleteReaderValue);
-} else {
-  deleteReaderValue.close();
+  for (AcidInputFormat.DeltaMetaData deltaMetaData : 
orcSplit.getDeltas()) {
+for (Path deleteDeltaDir : 
deltaMetaData.getPaths(orcSplit.getRootDir())) {
+  if (!isQualifiedDeleteDeltaForSplit(orcSplitMinMaxWriteIds, 
deleteDeltaDir)) {

Review comment:
   isQualifiedDeleteDelta basically reparses the delta dir. Can we prevent 
this?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvary commented on a change in pull request #1339: HIVE-23956: Delete delta fileIds should be pushed execution

2020-07-30 Thread GitBox


pvary commented on a change in pull request #1339:
URL: https://github.com/apache/hive/pull/1339#discussion_r463166514



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -680,14 +681,15 @@ public void setBaseAndInnerReader(
* @param path The Orc file path we want to get the OrcTail for
* @param conf The Configuration to access LLAP
* @param cacheTag The cacheTag needed to get OrcTail from LLAP IO cache
+   * @param fileKey fileId of the Orc file (either the Long fileId of HDFS or 
the SyntheticFileId)
* @return ReaderData object where the orcTail is not null. Reader can be 
null, but if we had to create
* one we return that as well for further reuse.
*/
-  private static ReaderData getOrcTail(Path path, Configuration conf, CacheTag 
cacheTag) throws IOException {
+  private static ReaderData getOrcTail(Path path, Configuration conf, CacheTag 
cacheTag, Object fileKey) throws IOException {

Review comment:
   Maybe 2 different getOrcTail method on the LLAP IO interface? @szlta?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvary commented on a change in pull request #1339: HIVE-23956: Delete delta fileIds should be pushed execution

2020-07-30 Thread GitBox


pvary commented on a change in pull request #1339:
URL: https://github.com/apache/hive/pull/1339#discussion_r463163936



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -1165,8 +1168,14 @@ private static ParsedDelta parseDelta(Path path, String 
deltaPrefix, FileSystem
 throws IOException {
 ParsedDelta p = parsedDelta(path, deltaPrefix, fs, dirSnapshot);
 boolean isDeleteDelta = deltaPrefix.equals(DELETE_DELTA_PREFIX);
+List files = new ArrayList<>();
+for (FileStatus fileStatus : dirSnapshot.getFiles()) {

Review comment:
   Nit: maybe do it in java8 way?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvary commented on a change in pull request #1339: HIVE-23956: Delete delta fileIds should be pushed execution

2020-07-30 Thread GitBox


pvary commented on a change in pull request #1339:
URL: https://github.com/apache/hive/pull/1339#discussion_r463163160



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -1113,10 +1119,13 @@ else if(statementId != parsedDelta.statementId) {
   && (last.getMinWriteId() == parsedDelta.getMinWriteId())
   && (last.getMaxWriteId() == parsedDelta.getMaxWriteId())) {
 last.getStmtIds().add(parsedDelta.getStatementId());
+for(HadoopShims.HdfsFileStatusWithId fileStatus : 
parsedDelta.getFiles()) {

Review comment:
   Nit: space before for





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvary commented on a change in pull request #1339: HIVE-23956: Delete delta fileIds should be pushed execution

2020-07-30 Thread GitBox


pvary commented on a change in pull request #1339:
URL: https://github.com/apache/hive/pull/1339#discussion_r463159912



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidInputFormat.java
##
@@ -118,70 +123,183 @@
  */
 private long visibilityTxnId;
 
+private List deltaFiles;
+
 public DeltaMetaData() {
-  this(0,0,new ArrayList(), 0);
+  this(0, 0, new ArrayList<>(), 0, new ArrayList<>());
 }
 /**
+ * @param minWriteId min writeId of the delta directory
+ * @param maxWriteId max writeId of the delta directory
  * @param stmtIds delta dir suffixes when a single txn writes > 1 delta in 
the same partition
  * @param visibilityTxnId maybe 0, if the dir name didn't have it.  
txnid:0 is always visible
+ * @param deltaFileStatuses bucketFiles in the directory
  */
-DeltaMetaData(long minWriteId, long maxWriteId, List stmtIds, 
long visibilityTxnId) {
+DeltaMetaData(long minWriteId, long maxWriteId, List stmtIds, 
long visibilityTxnId,
+List deltaFileStatuses) {
   this.minWriteId = minWriteId;
   this.maxWriteId = maxWriteId;
   if (stmtIds == null) {
 throw new IllegalArgumentException("stmtIds == null");
   }
   this.stmtIds = stmtIds;
   this.visibilityTxnId = visibilityTxnId;
+  this.deltaFiles = new ArrayList<>();
+  for(HadoopShims.HdfsFileStatusWithId fileStatus : deltaFileStatuses) {
+deltaFiles.add(new DeltaFileMetaData(fileStatus));
+  }
 }
+
 long getMinWriteId() {
   return minWriteId;
 }
+
 long getMaxWriteId() {
   return maxWriteId;
 }
+
 List getStmtIds() {
   return stmtIds;
 }
+
 long getVisibilityTxnId() {
   return visibilityTxnId;
 }
+
+public List getDeltaFiles() {
+  return deltaFiles;
+}
+
 @Override
 public void write(DataOutput out) throws IOException {
   out.writeLong(minWriteId);
   out.writeLong(maxWriteId);
   out.writeInt(stmtIds.size());
-  for(Integer id : stmtIds) {
+  for (Integer id : stmtIds) {
 out.writeInt(id);
   }
   out.writeLong(visibilityTxnId);
+  out.writeInt(deltaFiles.size());
+  for (DeltaFileMetaData fileMeta : deltaFiles) {
+fileMeta.write(out);
+  }
 }
+
 @Override
 public void readFields(DataInput in) throws IOException {
   minWriteId = in.readLong();
   maxWriteId = in.readLong();
   stmtIds.clear();
   int numStatements = in.readInt();
-  for(int i = 0; i < numStatements; i++) {
+  for (int i = 0; i < numStatements; i++) {
 stmtIds.add(in.readInt());
   }
   visibilityTxnId = in.readLong();
+
+  deltaFiles.clear();
+  int numFiles = in.readInt();
+  for(int i = 0; i< numFiles; i++) {
+DeltaFileMetaData file = new DeltaFileMetaData();
+file.readFields(in);
+deltaFiles.add(file);
+  }
 }
-String getName() {
+private String getName() {
   assert stmtIds.isEmpty() : "use getName(int)";
   return AcidUtils.addVisibilitySuffix(AcidUtils
   .deleteDeltaSubdir(minWriteId, maxWriteId), visibilityTxnId);
 }
-String getName(int stmtId) {
+private String getName(int stmtId) {
   assert !stmtIds.isEmpty() : "use getName()";
   return AcidUtils.addVisibilitySuffix(AcidUtils
   .deleteDeltaSubdir(minWriteId, maxWriteId, stmtId), visibilityTxnId);
 }
+
+public List getPaths(Path root) {
+  if (stmtIds.isEmpty()) {
+return Collections.singletonList(new Path(root, getName()));
+  } else {
+// To support multistatement transactions we may have multiple 
directories corresponding to one DeltaMetaData
+return getStmtIds().stream().map(stmtId -> new Path(root, 
getName(stmtId))).collect(Collectors.toList());
+  }
+}
 @Override
 public String toString() {
   return "Delta(?," + minWriteId + "," + maxWriteId + "," + stmtIds + "," 
+ visibilityTxnId + ")";
 }
   }
+  final class DeltaFileMetaData implements Writable {
+private static final int HAS_LONG_FILEID_FLAG = 1;
+private static final int HAS_ATTEMPTID_FLAG = 2;
+
+private long modTime;
+private long length;
+// Optional
+private Integer attemptId;
+// Optional
+private Long fileId;
+
+public DeltaFileMetaData() {
+}
+
+public DeltaFileMetaData(HadoopShims.HdfsFileStatusWithId fileStatus) {
+  modTime = fileStatus.getFileStatus().getModificationTime();
+  length = fileStatus.getFileStatus().getLen();
+  String attempt = 
AcidUtils.parseAttemptId(fileStatus.getFileStatus().getPath());
+  attemptId = StringUtils.isEmpty(attempt) ? null : 
Integer.parseInt(attempt);
+  fileId = fileStatus.getFileId();
+}
+
+public DeltaFileMetaData(long modTime, long length, @Nullable Integer 
attemptId, @Nullable Long fileId) {
+  this.modTime = modTime;
+  this.length = length;

[GitHub] [hive] pvary commented on a change in pull request #1339: HIVE-23956: Delete delta fileIds should be pushed execution

2020-07-30 Thread GitBox


pvary commented on a change in pull request #1339:
URL: https://github.com/apache/hive/pull/1339#discussion_r463156735



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidInputFormat.java
##
@@ -118,70 +123,183 @@
  */
 private long visibilityTxnId;
 
+private List deltaFiles;
+
 public DeltaMetaData() {
-  this(0,0,new ArrayList(), 0);
+  this(0, 0, new ArrayList<>(), 0, new ArrayList<>());
 }
 /**
+ * @param minWriteId min writeId of the delta directory
+ * @param maxWriteId max writeId of the delta directory
  * @param stmtIds delta dir suffixes when a single txn writes > 1 delta in 
the same partition
  * @param visibilityTxnId maybe 0, if the dir name didn't have it.  
txnid:0 is always visible
+ * @param deltaFileStatuses bucketFiles in the directory
  */
-DeltaMetaData(long minWriteId, long maxWriteId, List stmtIds, 
long visibilityTxnId) {
+DeltaMetaData(long minWriteId, long maxWriteId, List stmtIds, 
long visibilityTxnId,
+List deltaFileStatuses) {
   this.minWriteId = minWriteId;
   this.maxWriteId = maxWriteId;
   if (stmtIds == null) {
 throw new IllegalArgumentException("stmtIds == null");
   }
   this.stmtIds = stmtIds;
   this.visibilityTxnId = visibilityTxnId;
+  this.deltaFiles = new ArrayList<>();
+  for(HadoopShims.HdfsFileStatusWithId fileStatus : deltaFileStatuses) {
+deltaFiles.add(new DeltaFileMetaData(fileStatus));
+  }
 }
+
 long getMinWriteId() {
   return minWriteId;
 }
+
 long getMaxWriteId() {
   return maxWriteId;
 }
+
 List getStmtIds() {
   return stmtIds;
 }
+
 long getVisibilityTxnId() {
   return visibilityTxnId;
 }
+
+public List getDeltaFiles() {
+  return deltaFiles;
+}
+
 @Override
 public void write(DataOutput out) throws IOException {
   out.writeLong(minWriteId);
   out.writeLong(maxWriteId);
   out.writeInt(stmtIds.size());
-  for(Integer id : stmtIds) {
+  for (Integer id : stmtIds) {
 out.writeInt(id);
   }
   out.writeLong(visibilityTxnId);
+  out.writeInt(deltaFiles.size());
+  for (DeltaFileMetaData fileMeta : deltaFiles) {
+fileMeta.write(out);
+  }
 }
+
 @Override
 public void readFields(DataInput in) throws IOException {
   minWriteId = in.readLong();
   maxWriteId = in.readLong();
   stmtIds.clear();
   int numStatements = in.readInt();
-  for(int i = 0; i < numStatements; i++) {
+  for (int i = 0; i < numStatements; i++) {
 stmtIds.add(in.readInt());
   }
   visibilityTxnId = in.readLong();
+
+  deltaFiles.clear();
+  int numFiles = in.readInt();
+  for(int i = 0; i< numFiles; i++) {
+DeltaFileMetaData file = new DeltaFileMetaData();
+file.readFields(in);
+deltaFiles.add(file);
+  }
 }
-String getName() {
+private String getName() {
   assert stmtIds.isEmpty() : "use getName(int)";
   return AcidUtils.addVisibilitySuffix(AcidUtils
   .deleteDeltaSubdir(minWriteId, maxWriteId), visibilityTxnId);
 }
-String getName(int stmtId) {
+private String getName(int stmtId) {
   assert !stmtIds.isEmpty() : "use getName()";
   return AcidUtils.addVisibilitySuffix(AcidUtils
   .deleteDeltaSubdir(minWriteId, maxWriteId, stmtId), visibilityTxnId);
 }
+
+public List getPaths(Path root) {
+  if (stmtIds.isEmpty()) {
+return Collections.singletonList(new Path(root, getName()));
+  } else {
+// To support multistatement transactions we may have multiple 
directories corresponding to one DeltaMetaData
+return getStmtIds().stream().map(stmtId -> new Path(root, 
getName(stmtId))).collect(Collectors.toList());
+  }
+}
 @Override
 public String toString() {
   return "Delta(?," + minWriteId + "," + maxWriteId + "," + stmtIds + "," 
+ visibilityTxnId + ")";
 }
   }
+  final class DeltaFileMetaData implements Writable {

Review comment:
   Nit: newline 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvary commented on a change in pull request #1339: HIVE-23956: Delete delta fileIds should be pushed execution

2020-07-30 Thread GitBox


pvary commented on a change in pull request #1339:
URL: https://github.com/apache/hive/pull/1339#discussion_r463156327



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidInputFormat.java
##
@@ -118,70 +123,183 @@
  */
 private long visibilityTxnId;
 
+private List deltaFiles;
+
 public DeltaMetaData() {
-  this(0,0,new ArrayList(), 0);
+  this(0, 0, new ArrayList<>(), 0, new ArrayList<>());
 }
 /**

Review comment:
   Nit: newline 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvary commented on a change in pull request #1339: HIVE-23956: Delete delta fileIds should be pushed execution

2020-07-30 Thread GitBox


pvary commented on a change in pull request #1339:
URL: https://github.com/apache/hive/pull/1339#discussion_r463151233



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidInputFormat.java
##
@@ -118,70 +123,183 @@
  */
 private long visibilityTxnId;
 
+private List deltaFiles;
+
 public DeltaMetaData() {
-  this(0,0,new ArrayList(), 0);
+  this(0, 0, new ArrayList<>(), 0, new ArrayList<>());
 }
 /**
+ * @param minWriteId min writeId of the delta directory
+ * @param maxWriteId max writeId of the delta directory
  * @param stmtIds delta dir suffixes when a single txn writes > 1 delta in 
the same partition
  * @param visibilityTxnId maybe 0, if the dir name didn't have it.  
txnid:0 is always visible
+ * @param deltaFileStatuses bucketFiles in the directory
  */
-DeltaMetaData(long minWriteId, long maxWriteId, List stmtIds, 
long visibilityTxnId) {
+DeltaMetaData(long minWriteId, long maxWriteId, List stmtIds, 
long visibilityTxnId,
+List deltaFileStatuses) {
   this.minWriteId = minWriteId;
   this.maxWriteId = maxWriteId;
   if (stmtIds == null) {
 throw new IllegalArgumentException("stmtIds == null");
   }
   this.stmtIds = stmtIds;
   this.visibilityTxnId = visibilityTxnId;
+  this.deltaFiles = new ArrayList<>();
+  for(HadoopShims.HdfsFileStatusWithId fileStatus : deltaFileStatuses) {
+deltaFiles.add(new DeltaFileMetaData(fileStatus));
+  }
 }
+
 long getMinWriteId() {
   return minWriteId;
 }
+
 long getMaxWriteId() {
   return maxWriteId;
 }
+
 List getStmtIds() {
   return stmtIds;
 }
+
 long getVisibilityTxnId() {
   return visibilityTxnId;
 }
+
+public List getDeltaFiles() {
+  return deltaFiles;
+}
+
 @Override
 public void write(DataOutput out) throws IOException {
   out.writeLong(minWriteId);
   out.writeLong(maxWriteId);
   out.writeInt(stmtIds.size());
-  for(Integer id : stmtIds) {
+  for (Integer id : stmtIds) {
 out.writeInt(id);
   }
   out.writeLong(visibilityTxnId);
+  out.writeInt(deltaFiles.size());
+  for (DeltaFileMetaData fileMeta : deltaFiles) {
+fileMeta.write(out);
+  }
 }
+
 @Override
 public void readFields(DataInput in) throws IOException {
   minWriteId = in.readLong();
   maxWriteId = in.readLong();
   stmtIds.clear();
   int numStatements = in.readInt();
-  for(int i = 0; i < numStatements; i++) {
+  for (int i = 0; i < numStatements; i++) {
 stmtIds.add(in.readInt());
   }
   visibilityTxnId = in.readLong();
+
+  deltaFiles.clear();
+  int numFiles = in.readInt();
+  for(int i = 0; i< numFiles; i++) {
+DeltaFileMetaData file = new DeltaFileMetaData();
+file.readFields(in);
+deltaFiles.add(file);
+  }
 }
-String getName() {
+private String getName() {
   assert stmtIds.isEmpty() : "use getName(int)";
   return AcidUtils.addVisibilitySuffix(AcidUtils
   .deleteDeltaSubdir(minWriteId, maxWriteId), visibilityTxnId);
 }
-String getName(int stmtId) {
+private String getName(int stmtId) {
   assert !stmtIds.isEmpty() : "use getName()";
   return AcidUtils.addVisibilitySuffix(AcidUtils
   .deleteDeltaSubdir(minWriteId, maxWriteId, stmtId), visibilityTxnId);
 }
+
+public List getPaths(Path root) {
+  if (stmtIds.isEmpty()) {
+return Collections.singletonList(new Path(root, getName()));
+  } else {
+// To support multistatement transactions we may have multiple 
directories corresponding to one DeltaMetaData
+return getStmtIds().stream().map(stmtId -> new Path(root, 
getName(stmtId))).collect(Collectors.toList());
+  }
+}
 @Override

Review comment:
   Nit: newline





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvary commented on a change in pull request #1339: HIVE-23956: Delete delta fileIds should be pushed execution

2020-07-30 Thread GitBox


pvary commented on a change in pull request #1339:
URL: https://github.com/apache/hive/pull/1339#discussion_r463150602



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidInputFormat.java
##
@@ -118,70 +123,183 @@
  */
 private long visibilityTxnId;
 
+private List deltaFiles;
+
 public DeltaMetaData() {
-  this(0,0,new ArrayList(), 0);
+  this(0, 0, new ArrayList<>(), 0, new ArrayList<>());
 }
 /**
+ * @param minWriteId min writeId of the delta directory
+ * @param maxWriteId max writeId of the delta directory
  * @param stmtIds delta dir suffixes when a single txn writes > 1 delta in 
the same partition
  * @param visibilityTxnId maybe 0, if the dir name didn't have it.  
txnid:0 is always visible
+ * @param deltaFileStatuses bucketFiles in the directory
  */
-DeltaMetaData(long minWriteId, long maxWriteId, List stmtIds, 
long visibilityTxnId) {
+DeltaMetaData(long minWriteId, long maxWriteId, List stmtIds, 
long visibilityTxnId,
+List deltaFileStatuses) {
   this.minWriteId = minWriteId;
   this.maxWriteId = maxWriteId;
   if (stmtIds == null) {
 throw new IllegalArgumentException("stmtIds == null");
   }
   this.stmtIds = stmtIds;
   this.visibilityTxnId = visibilityTxnId;
+  this.deltaFiles = new ArrayList<>();
+  for(HadoopShims.HdfsFileStatusWithId fileStatus : deltaFileStatuses) {

Review comment:
   Nit: maybe a space after the for keyword? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] zeroflag commented on a change in pull request #1323: HIVE-23723. Limit operator pushdown through LOJ (amagyar)

2020-07-30 Thread GitBox


zeroflag commented on a change in pull request #1323:
URL: https://github.com/apache/hive/pull/1323#discussion_r463125357



##
File path: ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out
##
@@ -108,7 +111,7 @@ limit 1
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@src
  A masked pattern was here 
-0  val_0   0   val_0
+238val_238 238 val_238

Review comment:
   I'll try but I don't think it will help in this case. My understanding 
is that SORT_QUERY_RESULTS sorts the rows on the standard output on the 
client/driver side.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kgyrtkirk merged pull request #1303: HIVE-23915: Improve Github PR Template

2020-07-30 Thread GitBox


kgyrtkirk merged pull request #1303:
URL: https://github.com/apache/hive/pull/1303


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] risdenk opened a new pull request #1342: HIVE-23958: HiveServer2 should support additional keystore/truststore types besides JKS

2020-07-30 Thread GitBox


risdenk opened a new pull request #1342:
URL: https://github.com/apache/hive/pull/1342


   Ensures that HS2 uses the default JDK keystore/truststore type when 
interacting with Thrift and Jetty. It isn't possible to add a test for this 
since FIPS compatible keystore/truststore needs special TLS provider and isn't 
shipped w/ the JDK. Using JKS or PKCS12 would work in either case since the JDK 
supports both in JDK 8 and 11. 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kgyrtkirk opened a new pull request #1341: HIVE-23959 bulk stat removal

2020-07-30 Thread GitBox


kgyrtkirk opened a new pull request #1341:
URL: https://github.com/apache/hive/pull/1341


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] zeroflag commented on a change in pull request #1323: HIVE-23723. Limit operator pushdown through LOJ (amagyar)

2020-07-30 Thread GitBox


zeroflag commented on a change in pull request #1323:
URL: https://github.com/apache/hive/pull/1323#discussion_r463115806



##
File path: ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out
##
@@ -1207,7 +1223,6 @@ limit 1 offset 1
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@src
  A masked pattern was here 
-0  val_0   0   val_0

Review comment:
   Good catch, it might be because there is an offset=1 belongs to the 
limit. The Limit is pushed through the LOJ, but the original one is also kept. 
The first limit emits 1 row, but the second emits 0 because of offset=1.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 opened a new pull request #1340: HIVE-23953 : Use task counter information to compute keycount during hashtable loading

2020-07-30 Thread GitBox


maheshk114 opened a new pull request #1340:
URL: https://github.com/apache/hive/pull/1340


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] zeroflag commented on a change in pull request #1323: HIVE-23723. Limit operator pushdown through LOJ (amagyar)

2020-07-30 Thread GitBox


zeroflag commented on a change in pull request #1323:
URL: https://github.com/apache/hive/pull/1323#discussion_r463107179



##
File path: ql/src/test/results/clientpositive/llap/input14_limit.q.out
##
@@ -75,19 +75,19 @@ STAGE PLANS:
 Reducer 2 
 Execution mode: vectorized, llap
 Reduce Operator Tree:
-  Select Operator
-expressions: VALUE._col0 (type: string), VALUE._col1 (type: 
string)
-outputColumnNames: _col0, _col1
-Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
-Limit
-  Number of rows: 20
-  Statistics: Num rows: 20 Data size: 3560 Basic stats: 
COMPLETE Column stats: COMPLETE
-  Top N Key Operator
-sort order: +
-keys: _col0 (type: string)
-null sort order: a
+  Limit

Review comment:
   I agree, opened HIVE-23957 about it..





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] maheshk114 merged pull request #1293: HIVE-23894: SubmitDag should not be retried incase of query cancel

2020-07-30 Thread GitBox


maheshk114 merged pull request #1293:
URL: https://github.com/apache/hive/pull/1293


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvargacl opened a new pull request #1339: HIVE-23956: Delete delta fileIds should be pushed execution

2020-07-30 Thread GitBox


pvargacl opened a new pull request #1339:
URL: https://github.com/apache/hive/pull/1339


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kgyrtkirk commented on pull request #1252: HIVE-23847: Extracting hive-parser module broke exec jar upload in tez

2020-07-30 Thread GitBox


kgyrtkirk commented on pull request #1252:
URL: https://github.com/apache/hive/pull/1252#issuecomment-666403793


   we should get this in sooner than later - I've also bumped into this issue; 
and had to figure it out again...I didn't connected the dots



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kgyrtkirk merged pull request #1272: HIVE-23727: Improve SQLOperation log handling when canceling background

2020-07-30 Thread GitBox


kgyrtkirk merged pull request #1272:
URL: https://github.com/apache/hive/pull/1272


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] soumyakanti3578 commented on a change in pull request #1317: HIVE-23949: Introduce caching layer in HS2 to accelerate query compilation

2020-07-30 Thread GitBox


soumyakanti3578 commented on a change in pull request #1317:
URL: https://github.com/apache/hive/pull/1317#discussion_r463036109



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -3665,33 +3678,39 @@ public boolean dropPartition(String dbName, String 
tableName, List parti
* @return list of partition objects
*/
   public List getPartitions(Table tbl) throws HiveException {
-if (tbl.isPartitioned()) {
-  List tParts;
-  try {
-GetPartitionsPsWithAuthRequest req = new 
GetPartitionsPsWithAuthRequest();
-req.setTblName(tbl.getTableName());
-req.setDbName(tbl.getDbName());
-req.setUserName(getUserName());
-req.setMaxParts((short) -1);
-req.setGroupNames(getGroupNames());
-if (AcidUtils.isTransactionalTable(tbl)) {
-  ValidWriteIdList validWriteIdList = 
getValidWriteIdList(tbl.getDbName(), tbl.getTableName());
-  req.setValidWriteIdList(validWriteIdList != null ? 
validWriteIdList.toString() : null);
-}
-GetPartitionsPsWithAuthResponse res = 
getMSC().listPartitionsWithAuthInfoRequest(req);
-tParts = res.getPartitions();
+long t1 = System.nanoTime();
+try {
+  if (tbl.isPartitioned()) {
+List tParts;
+try {
+  GetPartitionsPsWithAuthRequest req = new 
GetPartitionsPsWithAuthRequest();
+  req.setTblName(tbl.getTableName());
+  req.setDbName(tbl.getDbName());
+  req.setUserName(getUserName());
+  req.setMaxParts((short) -1);
+  req.setGroupNames(getGroupNames());
+  if (AcidUtils.isTransactionalTable(tbl)) {
+ValidWriteIdList validWriteIdList = 
getValidWriteIdList(tbl.getDbName(), tbl.getTableName());
+req.setValidWriteIdList(validWriteIdList != null ? 
validWriteIdList.toString() : null);
+  }
+  GetPartitionsPsWithAuthResponse res = 
getMSC().listPartitionsWithAuthInfoRequest(req);
+  tParts = res.getPartitions();
 
-  } catch (Exception e) {
-LOG.error(StringUtils.stringifyException(e));
-throw new HiveException(e);
-  }
-  List parts = new ArrayList<>(tParts.size());
-  for (org.apache.hadoop.hive.metastore.api.Partition tpart : tParts) {
-parts.add(new Partition(tbl, tpart));
+} catch (Exception e) {
+  LOG.error(StringUtils.stringifyException(e));
+  throw new HiveException(e);
+}
+List parts = new ArrayList<>(tParts.size());
+for (org.apache.hadoop.hive.metastore.api.Partition tpart : tParts) {
+  parts.add(new Partition(tbl, tpart));
+}
+return parts;
+  } else {
+return Collections.singletonList(new Partition(tbl));
   }
-  return parts;
-} else {
-  return Collections.singletonList(new Partition(tbl));
+} finally {

Review comment:
   Since there are many instances for 2., I am moving all `PerfLogEnd` to 
`finally` block for now, which will probably be easier to maintain in the 
future, because of uniformity.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] soumyakanti3578 commented on a change in pull request #1317: HIVE-23949: Introduce caching layer in HS2 to accelerate query compilation

2020-07-30 Thread GitBox


soumyakanti3578 commented on a change in pull request #1317:
URL: https://github.com/apache/hive/pull/1317#discussion_r463036109



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -3665,33 +3678,39 @@ public boolean dropPartition(String dbName, String 
tableName, List parti
* @return list of partition objects
*/
   public List getPartitions(Table tbl) throws HiveException {
-if (tbl.isPartitioned()) {
-  List tParts;
-  try {
-GetPartitionsPsWithAuthRequest req = new 
GetPartitionsPsWithAuthRequest();
-req.setTblName(tbl.getTableName());
-req.setDbName(tbl.getDbName());
-req.setUserName(getUserName());
-req.setMaxParts((short) -1);
-req.setGroupNames(getGroupNames());
-if (AcidUtils.isTransactionalTable(tbl)) {
-  ValidWriteIdList validWriteIdList = 
getValidWriteIdList(tbl.getDbName(), tbl.getTableName());
-  req.setValidWriteIdList(validWriteIdList != null ? 
validWriteIdList.toString() : null);
-}
-GetPartitionsPsWithAuthResponse res = 
getMSC().listPartitionsWithAuthInfoRequest(req);
-tParts = res.getPartitions();
+long t1 = System.nanoTime();
+try {
+  if (tbl.isPartitioned()) {
+List tParts;
+try {
+  GetPartitionsPsWithAuthRequest req = new 
GetPartitionsPsWithAuthRequest();
+  req.setTblName(tbl.getTableName());
+  req.setDbName(tbl.getDbName());
+  req.setUserName(getUserName());
+  req.setMaxParts((short) -1);
+  req.setGroupNames(getGroupNames());
+  if (AcidUtils.isTransactionalTable(tbl)) {
+ValidWriteIdList validWriteIdList = 
getValidWriteIdList(tbl.getDbName(), tbl.getTableName());
+req.setValidWriteIdList(validWriteIdList != null ? 
validWriteIdList.toString() : null);
+  }
+  GetPartitionsPsWithAuthResponse res = 
getMSC().listPartitionsWithAuthInfoRequest(req);
+  tParts = res.getPartitions();
 
-  } catch (Exception e) {
-LOG.error(StringUtils.stringifyException(e));
-throw new HiveException(e);
-  }
-  List parts = new ArrayList<>(tParts.size());
-  for (org.apache.hadoop.hive.metastore.api.Partition tpart : tParts) {
-parts.add(new Partition(tbl, tpart));
+} catch (Exception e) {
+  LOG.error(StringUtils.stringifyException(e));
+  throw new HiveException(e);
+}
+List parts = new ArrayList<>(tParts.size());
+for (org.apache.hadoop.hive.metastore.api.Partition tpart : tParts) {
+  parts.add(new Partition(tbl, tpart));
+}
+return parts;
+  } else {
+return Collections.singletonList(new Partition(tbl));
   }
-  return parts;
-} else {
-  return Collections.singletonList(new Partition(tbl));
+} finally {

Review comment:
   Since there are many instances for 2., I am moving all `PerfLogEnd` to 
`finally` block for now, which will probably be easier to maintain in the 
future.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] zeroflag opened a new pull request #1338: HIVE-23937. Take null ordering into consideration when pushing TNK through inner joins (amagyar)

2020-07-30 Thread GitBox


zeroflag opened a new pull request #1338:
URL: https://github.com/apache/hive/pull/1338


   Work in progress
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] jcamachor merged pull request #1305: HIVE-23892: Remove interpretation for character RexLiteral

2020-07-30 Thread GitBox


jcamachor merged pull request #1305:
URL: https://github.com/apache/hive/pull/1305


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] letsflyinthesky commented on pull request #1322: HIVE-23893/HIVE-14661,split filter into deterministic filter which will be push down and nondeterministic filter which keeps current po

2020-07-30 Thread GitBox


letsflyinthesky commented on pull request #1322:
URL: https://github.com/apache/hive/pull/1322#issuecomment-666349043


   May you give me some advice about why to remove. My commit could don't 
resolve the problem or the way I the commi dose not conform to regulation.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] dengzhhu653 commented on pull request #1322: HIVE-23893/HIVE-14661,split filter into deterministic filter which will be push down and nondeterministic filter which keeps current positi

2020-07-30 Thread GitBox


dengzhhu653 commented on pull request #1322:
URL: https://github.com/apache/hive/pull/1322#issuecomment-666340785


   Hey @letsflyinthesky, you can move your fixes upon the master branch. There 
are some detailed useful guidelines of how to contribute on 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] zabetak commented on pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause

2020-07-30 Thread GitBox


zabetak commented on pull request #1315:
URL: https://github.com/apache/hive/pull/1315#issuecomment-666334140


   > @jcamachor @zabetak I was thinking of changing all the existing tpcds 
queries to replace literals with `?` and run explain on tpcds queries (using 
TestTezPerfCliDriver). This should provide us with some good test coverage. 
What I am undecided on is if it worth pushing it to hive repo and have a 
separate clidriver testing execute/prepare for tpcds. What do you suggest?
   
   If we want to avoid regressions I guess having the queries in the repo makes 
sense. Apart from that I guess that not all of the tpcds queries are needed to 
ensure code coverage. Possibly after testing you will end-up with a subset that 
is sufficient.  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kasakrisz commented on a change in pull request #1323: HIVE-23723. Limit operator pushdown through LOJ (amagyar)

2020-07-30 Thread GitBox


kasakrisz commented on a change in pull request #1323:
URL: https://github.com/apache/hive/pull/1323#discussion_r462852316



##
File path: ql/src/test/results/clientpositive/llap/explainuser_1.q.out
##
@@ -124,27 +124,25 @@ Stage-3
   <-Reducer 2 [SIMPLE_EDGE] llap
 File Output Operator [FS_7]
   table:{"name:":"default.src_orc_merge_test_part_n1"}
-  Select Operator [SEL_6] (rows=100 width=95)
+  Select Operator [SEL_4] (rows=100 width=178)
 Output:["_col0","_col1"]
 Limit [LIM_5] (rows=100 width=178)
   Number of rows:100
-  Select Operator [SEL_4] (rows=100 width=178)
-Output:["_col0","_col1"]
-  <-Map 1 [CUSTOM_SIMPLE_EDGE] llap
-PARTITION_ONLY_SHUFFLE [RS_3]
-  Limit [LIM_2] (rows=100 width=178)
-Number of rows:100
-Select Operator [SEL_1] (rows=500 width=178)
-  Output:["_col0","_col1"]
-  TableScan [TS_0] (rows=500 width=178)
-
default@src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]
+<-Map 1 [CUSTOM_SIMPLE_EDGE] llap
+  PARTITION_ONLY_SHUFFLE [RS_3]
+Limit [LIM_2] (rows=100 width=178)

Review comment:
   Why couldn't be pushed through`[SEL_1]`  ?

##
File path: ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out
##
@@ -108,7 +111,7 @@ limit 1
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@src
  A masked pattern was here 
-0  val_0   0   val_0
+238val_238 238 val_238

Review comment:
   Does `-- SORT_QUERY_RESULTS` help avoid resultset change? 

##
File path: ql/src/test/results/clientpositive/llap/input14_limit.q.out
##
@@ -75,19 +75,19 @@ STAGE PLANS:
 Reducer 2 
 Execution mode: vectorized, llap
 Reduce Operator Tree:
-  Select Operator
-expressions: VALUE._col0 (type: string), VALUE._col1 (type: 
string)
-outputColumnNames: _col0, _col1
-Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
-Limit
-  Number of rows: 20
-  Statistics: Num rows: 20 Data size: 3560 Basic stats: 
COMPLETE Column stats: COMPLETE
-  Top N Key Operator
-sort order: +
-keys: _col0 (type: string)
-null sort order: a
+  Limit

Review comment:
   It is worth investigating why we end up with a plan where the parent of 
a `TNK` is a `Limit` but that can be a follow-up patch since it is out of scope 
of this patch.

##
File path: ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out
##
@@ -1207,7 +1223,6 @@ limit 1 offset 1
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@src
  A masked pattern was here 
-0  val_0   0   val_0

Review comment:
   Result set disappeared.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] aasha commented on a change in pull request #1329: HIVE-23897 : Create a common Retry Interface for replication

2020-07-30 Thread GitBox


aasha commented on a change in pull request #1329:
URL: https://github.com/apache/hive/pull/1329#discussion_r462835685



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/util/Retryable.java
##
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.util;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.utils.SecurityUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.Callable;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Class to implement any retry logic in case of exceptions.
+ */
+public class Retryable {
+  private static long MINIMUM_DELAY_IN_SEC = 1;
+
+  private long totalDurationInSeconds;
+  private List> retryOn;
+  private List> failOn;
+  private long initialDelayInSeconds;
+  private long maxRetryDelayInSeconds;
+  private double backOff;
+  private int maxJitterInSeconds;
+
+  private Retryable() {
+this.retryOn = new ArrayList<>();
+this.failOn = new ArrayList<>();
+this.initialDelayInSeconds = 
HiveConf.toTime(HiveConf.ConfVars.REPL_RETRY_INTIAL_DELAY.defaultStrVal,
+  HiveConf.getDefaultTimeUnit(HiveConf.ConfVars.REPL_RETRY_INTIAL_DELAY), 
TimeUnit.SECONDS);
+this.maxRetryDelayInSeconds = 
HiveConf.toTime(HiveConf.ConfVars.REPL_RETRY_MAX_DELAY_BETWEEN_RETRIES.defaultStrVal,
+  
HiveConf.getDefaultTimeUnit(HiveConf.ConfVars.REPL_RETRY_MAX_DELAY_BETWEEN_RETRIES),
 TimeUnit.SECONDS);
+this.backOff = 
HiveConf.ConfVars.REPL_RETRY_BACKOFF_COEFFICIENT.defaultFloatVal;
+this.maxJitterInSeconds = (int) 
HiveConf.toTime(HiveConf.ConfVars.REPL_RETRY_JITTER.defaultStrVal,
+  HiveConf.getDefaultTimeUnit(HiveConf.ConfVars.REPL_RETRY_JITTER), 
TimeUnit.SECONDS);
+this.totalDurationInSeconds = 
HiveConf.toTime(HiveConf.ConfVars.REPL_RETRY_TOTAL_DURATION.defaultStrVal,
+  
HiveConf.getDefaultTimeUnit(HiveConf.ConfVars.REPL_RETRY_TOTAL_DURATION), 
TimeUnit.SECONDS);;
+  }
+
+  public static Builder builder() {
+return new Builder();
+  }
+
+  public  T executeCallable(Callable callable) throws Exception {
+long startTime = System.currentTimeMillis();
+long delay = this.initialDelayInSeconds;
+Exception currentCapturedException = null;
+while(true) {
+  try {
+if (UserGroupInformation.isSecurityEnabled()) {
+  SecurityUtils.reloginExpiringKeytabUser();
+  return 
UserGroupInformation.getLoginUser().doAs((PrivilegedExceptionAction) () -> 
callable.call());
+} else {
+  return callable.call();
+}
+  } catch (Exception e) {
+if (this.failOn.stream().noneMatch(k -> e.getClass().equals(k))
+  && this.retryOn.stream().anyMatch(k -> 
e.getClass().isAssignableFrom(k))) {
+  if (elapsedTimeInSeconds(startTime) + delay > 
this.totalDurationInSeconds) {
+// case where waiting would go beyond max duration. So throw 
exception and return
+throw e;
+  }
+  sleep(delay);
+  //retry case. compute next sleep time
+  delay = getNextDelay(delay, currentCapturedException, e);
+  // reset current captured exception.
+  currentCapturedException = e;
+} else {
+  // Exception cannot be retried on. Throw exception and return
+  throw e;
+}
+  }
+}
+  }
+
+  private void sleep(long seconds) {
+try {
+  Thread.sleep(seconds * 1000);
+} catch (InterruptedException e) {
+  // no-op.. just proceed
+}
+  }
+
+  private long getNextDelay(long currentDelay, final Exception 
previousException, final Exception currentException) {
+if (previousException != null && 
!previousException.getClass().equals(currentException.getClass())) {
+  // New exception encountered. Returning initial delay for next retry.
+  return this.initialDelayInSeconds;
+}
+
+if (currentDelay <= 0) { // in case initial delay was set to 0.
+  currentDelay = MINIMUM_DELAY_IN_SEC;
+}
+
+currentDelay *= this.backOff;
+

[GitHub] [hive] aasha commented on a change in pull request #1329: HIVE-23897 : Create a common Retry Interface for replication

2020-07-30 Thread GitBox


aasha commented on a change in pull request #1329:
URL: https://github.com/apache/hive/pull/1329#discussion_r462835685



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/util/Retryable.java
##
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.util;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.utils.SecurityUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.Callable;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Class to implement any retry logic in case of exceptions.
+ */
+public class Retryable {
+  private static long MINIMUM_DELAY_IN_SEC = 1;
+
+  private long totalDurationInSeconds;
+  private List> retryOn;
+  private List> failOn;
+  private long initialDelayInSeconds;
+  private long maxRetryDelayInSeconds;
+  private double backOff;
+  private int maxJitterInSeconds;
+
+  private Retryable() {
+this.retryOn = new ArrayList<>();
+this.failOn = new ArrayList<>();
+this.initialDelayInSeconds = 
HiveConf.toTime(HiveConf.ConfVars.REPL_RETRY_INTIAL_DELAY.defaultStrVal,
+  HiveConf.getDefaultTimeUnit(HiveConf.ConfVars.REPL_RETRY_INTIAL_DELAY), 
TimeUnit.SECONDS);
+this.maxRetryDelayInSeconds = 
HiveConf.toTime(HiveConf.ConfVars.REPL_RETRY_MAX_DELAY_BETWEEN_RETRIES.defaultStrVal,
+  
HiveConf.getDefaultTimeUnit(HiveConf.ConfVars.REPL_RETRY_MAX_DELAY_BETWEEN_RETRIES),
 TimeUnit.SECONDS);
+this.backOff = 
HiveConf.ConfVars.REPL_RETRY_BACKOFF_COEFFICIENT.defaultFloatVal;
+this.maxJitterInSeconds = (int) 
HiveConf.toTime(HiveConf.ConfVars.REPL_RETRY_JITTER.defaultStrVal,
+  HiveConf.getDefaultTimeUnit(HiveConf.ConfVars.REPL_RETRY_JITTER), 
TimeUnit.SECONDS);
+this.totalDurationInSeconds = 
HiveConf.toTime(HiveConf.ConfVars.REPL_RETRY_TOTAL_DURATION.defaultStrVal,
+  
HiveConf.getDefaultTimeUnit(HiveConf.ConfVars.REPL_RETRY_TOTAL_DURATION), 
TimeUnit.SECONDS);;
+  }
+
+  public static Builder builder() {
+return new Builder();
+  }
+
+  public  T executeCallable(Callable callable) throws Exception {
+long startTime = System.currentTimeMillis();
+long delay = this.initialDelayInSeconds;
+Exception currentCapturedException = null;
+while(true) {
+  try {
+if (UserGroupInformation.isSecurityEnabled()) {
+  SecurityUtils.reloginExpiringKeytabUser();
+  return 
UserGroupInformation.getLoginUser().doAs((PrivilegedExceptionAction) () -> 
callable.call());
+} else {
+  return callable.call();
+}
+  } catch (Exception e) {
+if (this.failOn.stream().noneMatch(k -> e.getClass().equals(k))
+  && this.retryOn.stream().anyMatch(k -> 
e.getClass().isAssignableFrom(k))) {
+  if (elapsedTimeInSeconds(startTime) + delay > 
this.totalDurationInSeconds) {
+// case where waiting would go beyond max duration. So throw 
exception and return
+throw e;
+  }
+  sleep(delay);
+  //retry case. compute next sleep time
+  delay = getNextDelay(delay, currentCapturedException, e);
+  // reset current captured exception.
+  currentCapturedException = e;
+} else {
+  // Exception cannot be retried on. Throw exception and return
+  throw e;
+}
+  }
+}
+  }
+
+  private void sleep(long seconds) {
+try {
+  Thread.sleep(seconds * 1000);
+} catch (InterruptedException e) {
+  // no-op.. just proceed
+}
+  }
+
+  private long getNextDelay(long currentDelay, final Exception 
previousException, final Exception currentException) {
+if (previousException != null && 
!previousException.getClass().equals(currentException.getClass())) {
+  // New exception encountered. Returning initial delay for next retry.
+  return this.initialDelayInSeconds;
+}
+
+if (currentDelay <= 0) { // in case initial delay was set to 0.
+  currentDelay = MINIMUM_DELAY_IN_SEC;
+}
+
+currentDelay *= this.backOff;
+

[GitHub] [hive] vineetgarg02 commented on pull request #1315: [HIVE-23951] Support parameterized queries in WHERE/HAVING clause

2020-07-30 Thread GitBox


vineetgarg02 commented on pull request #1315:
URL: https://github.com/apache/hive/pull/1315#issuecomment-665947086


   @jcamachor @zabetak I was thinking of changing all the existing tpcds 
queries to replace literals with `?` and run explain on tpcds queries (using 
TestTezPerfCliDriver). This should provide us with some good test coverage. 
What I am undecided on is if it worth pushing it to hive repo and have a 
separate clidriver testing execute/prepare for tpcds. What do you suggest?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] sam-an-cloudera commented on a change in pull request #1221: HIVE-23786: HMS server side filter with Ranger

2020-07-30 Thread GitBox


sam-an-cloudera commented on a change in pull request #1221:
URL: https://github.com/apache/hive/pull/1221#discussion_r462496195



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/HiveMetaStoreAuthorizer.java
##
@@ -312,5 +558,9 @@ private String getCurrentUser() {
   private String getCurrentUser(HiveMetaStoreAuthorizableEvent 
authorizableEvent) {
 return authorizableEvent.getAuthzContext().getUGI().getShortUserName();
   }
+
+  private UserGroupInformation getUGI() throws IOException {
+  return UserGroupInformation.getCurrentUser();

Review comment:
   all corrected. How did you spot them so quickly? I wish I have a tool to 
find out before I submit. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] jcamachor commented on a change in pull request #1317: HIVE-23949: Introduce caching layer in HS2 to accelerate query compilation

2020-07-30 Thread GitBox


jcamachor commented on a change in pull request #1317:
URL: https://github.com/apache/hive/pull/1317#discussion_r462584582



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -3665,33 +3678,39 @@ public boolean dropPartition(String dbName, String 
tableName, List parti
* @return list of partition objects
*/
   public List getPartitions(Table tbl) throws HiveException {
-if (tbl.isPartitioned()) {
-  List tParts;
-  try {
-GetPartitionsPsWithAuthRequest req = new 
GetPartitionsPsWithAuthRequest();
-req.setTblName(tbl.getTableName());
-req.setDbName(tbl.getDbName());
-req.setUserName(getUserName());
-req.setMaxParts((short) -1);
-req.setGroupNames(getGroupNames());
-if (AcidUtils.isTransactionalTable(tbl)) {
-  ValidWriteIdList validWriteIdList = 
getValidWriteIdList(tbl.getDbName(), tbl.getTableName());
-  req.setValidWriteIdList(validWriteIdList != null ? 
validWriteIdList.toString() : null);
-}
-GetPartitionsPsWithAuthResponse res = 
getMSC().listPartitionsWithAuthInfoRequest(req);
-tParts = res.getPartitions();
+long t1 = System.nanoTime();
+try {
+  if (tbl.isPartitioned()) {
+List tParts;
+try {
+  GetPartitionsPsWithAuthRequest req = new 
GetPartitionsPsWithAuthRequest();
+  req.setTblName(tbl.getTableName());
+  req.setDbName(tbl.getDbName());
+  req.setUserName(getUserName());
+  req.setMaxParts((short) -1);
+  req.setGroupNames(getGroupNames());
+  if (AcidUtils.isTransactionalTable(tbl)) {
+ValidWriteIdList validWriteIdList = 
getValidWriteIdList(tbl.getDbName(), tbl.getTableName());
+req.setValidWriteIdList(validWriteIdList != null ? 
validWriteIdList.toString() : null);
+  }
+  GetPartitionsPsWithAuthResponse res = 
getMSC().listPartitionsWithAuthInfoRequest(req);
+  tParts = res.getPartitions();
 
-  } catch (Exception e) {
-LOG.error(StringUtils.stringifyException(e));
-throw new HiveException(e);
-  }
-  List parts = new ArrayList<>(tParts.size());
-  for (org.apache.hadoop.hive.metastore.api.Partition tpart : tParts) {
-parts.add(new Partition(tbl, tpart));
+} catch (Exception e) {
+  LOG.error(StringUtils.stringifyException(e));
+  throw new HiveException(e);
+}
+List parts = new ArrayList<>(tParts.size());
+for (org.apache.hadoop.hive.metastore.api.Partition tpart : tParts) {
+  parts.add(new Partition(tbl, tpart));
+}
+return parts;
+  } else {
+return Collections.singletonList(new Partition(tbl));
   }
-  return parts;
-} else {
-  return Collections.singletonList(new Partition(tbl));
+} finally {

Review comment:
   I did not catch this comment in the review. I was taking a look at the 
code and I think there are two problems with current approach (sorry about the 
back and forth).
   1) Currently we are logging information twice (PerfLogger already has its 
own logging mechanism when logging is set to debug level). We should remove the 
logging lines outside of it.
   2) I think we should have left the PerfLogEnd in the `finally` block. I saw 
in some methods that we catch an exception and ignore it, which could lead to 
leaks since we do not call PerfLogEnd. Even if currently this may not be a 
problem, it would be difficult to verify that nothing changes in the future 
there. I think the safest path is just to make that call in the `finally` block 
to make sure it is always called.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -3665,33 +3678,39 @@ public boolean dropPartition(String dbName, String 
tableName, List parti
* @return list of partition objects
*/
   public List getPartitions(Table tbl) throws HiveException {
-if (tbl.isPartitioned()) {
-  List tParts;
-  try {
-GetPartitionsPsWithAuthRequest req = new 
GetPartitionsPsWithAuthRequest();
-req.setTblName(tbl.getTableName());
-req.setDbName(tbl.getDbName());
-req.setUserName(getUserName());
-req.setMaxParts((short) -1);
-req.setGroupNames(getGroupNames());
-if (AcidUtils.isTransactionalTable(tbl)) {
-  ValidWriteIdList validWriteIdList = 
getValidWriteIdList(tbl.getDbName(), tbl.getTableName());
-  req.setValidWriteIdList(validWriteIdList != null ? 
validWriteIdList.toString() : null);
-}
-GetPartitionsPsWithAuthResponse res = 
getMSC().listPartitionsWithAuthInfoRequest(req);
-tParts = res.getPartitions();
+long t1 = System.nanoTime();
+try {
+  if (tbl.isPartitioned()) {
+List tParts;
+try {
+  GetPartitionsPsWithAuthRequest req = new 
GetPartitionsPsWithAuthRequest();
+  req.setTblName(

[GitHub] [hive] aasha commented on a change in pull request #1329: HIVE-23897 : Create a common Retry Interface for replication

2020-07-30 Thread GitBox


aasha commented on a change in pull request #1329:
URL: https://github.com/apache/hive/pull/1329#discussion_r462697552



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/util/Retryable.java
##
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.util;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.utils.SecurityUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+
+import java.security.PrivilegedExceptionAction;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.Callable;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Class to implement any retry logic in case of exceptions.
+ */
+public class Retryable {
+  private static long MINIMUM_DELAY_IN_SEC = 1;
+
+  private long totalDurationInSeconds;
+  private List> retryOn;
+  private List> failOn;
+  private long initialDelayInSeconds;
+  private long maxRetryDelayInSeconds;
+  private double backOff;
+  private int maxJitterInSeconds;
+
+  private Retryable() {
+this.retryOn = new ArrayList<>();
+this.failOn = new ArrayList<>();
+this.initialDelayInSeconds = 
HiveConf.toTime(HiveConf.ConfVars.REPL_RETRY_INTIAL_DELAY.defaultStrVal,
+  HiveConf.getDefaultTimeUnit(HiveConf.ConfVars.REPL_RETRY_INTIAL_DELAY), 
TimeUnit.SECONDS);

Review comment:
   Didn't get this comment. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] HunterL edited a comment on pull request #1324: HIVE-23939: SharedWorkOptimizer: take the union of columns in mergeable TableScans

2020-07-30 Thread GitBox


HunterL edited a comment on pull request #1324:
URL: https://github.com/apache/hive/pull/1324#issuecomment-665824148







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] soumyakanti3578 commented on a change in pull request #1317: HIVE-23949: Introduce caching layer in HS2 to accelerate query compilation

2020-07-30 Thread GitBox


soumyakanti3578 commented on a change in pull request #1317:
URL: https://github.com/apache/hive/pull/1317#discussion_r462602014



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientWithLocalCache.java
##
@@ -0,0 +1,264 @@
+package org.apache.hadoop.hive.metastore;
+
+import com.github.benmanes.caffeine.cache.Cache;
+import com.github.benmanes.caffeine.cache.Caffeine;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.PartitionsByExprRequest;
+import org.apache.hadoop.hive.metastore.api.PartitionsByExprResult;
+import org.apache.hadoop.hive.metastore.api.PartitionsSpecByExprResult;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.ql.util.IncrementalObjectSizeEstimator;
+import 
org.apache.hadoop.hive.ql.util.IncrementalObjectSizeEstimator.ObjectEstimator;
+import org.apache.thrift.TException;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.Objects;
+
+public class HiveMetaStoreClientWithLocalCache extends HiveMetaStoreClient {
+
+  private static Cache mscLocalCache = null;
+  //TODO: initialize in the init method
+  private static boolean IS_CACHE_ENABLED;
+  private static long MAX_SIZE;
+  private static boolean RECORD_STATS;
+  private static HashMap, ObjectEstimator> sizeEstimator = null;
+  private static String cacheObjName = null;
+
+  public static synchronized void init() {
+if (mscLocalCache != null) return; // init cache only once
+Configuration metaConf = MetastoreConf.newMetastoreConf();
+LOG.debug("Initializing local cache in HiveMetaStoreClient...");
+MAX_SIZE = MetastoreConf.getSizeVar(metaConf, 
MetastoreConf.ConfVars.MSC_CACHE_MAX_SIZE);
+IS_CACHE_ENABLED = MetastoreConf.getBoolVar(metaConf, 
MetastoreConf.ConfVars.MSC_CACHE_ENABLED);
+RECORD_STATS = MetastoreConf.getBoolVar(metaConf, 
MetastoreConf.ConfVars.MSC_CACHE_RECORD_STATS);
+initSizeEstimator();
+initCache();
+LOG.debug("Local cache initialized in HiveMetaStoreClient: " + 
mscLocalCache);
+  }
+
+  public HiveMetaStoreClientWithLocalCache(Configuration conf) throws 
MetaException {
+this(conf, null, true);
+  }
+
+  public HiveMetaStoreClientWithLocalCache(Configuration conf, 
HiveMetaHookLoader hookLoader) throws MetaException {
+this(conf, hookLoader, true);
+  }
+
+  public HiveMetaStoreClientWithLocalCache(Configuration conf, 
HiveMetaHookLoader hookLoader, Boolean allowEmbedded) throws MetaException {
+super(conf, hookLoader, allowEmbedded);
+  }
+
+  private static void initSizeEstimator() {
+sizeEstimator = new HashMap<>();
+IncrementalObjectSizeEstimator.createEstimators(CacheKey.class, 
sizeEstimator);
+Arrays.stream(KeyType.values()).forEach(e -> {
+  IncrementalObjectSizeEstimator.createEstimators(e.keyClass, 
sizeEstimator);
+  IncrementalObjectSizeEstimator.createEstimators(e.valueClass, 
sizeEstimator);}
+);
+  }
+
+  /**
+   * KeyType is used to differentiate the request types. More types can be 
added in future.
+   */
+  public enum KeyType {
+PARTITIONS_BY_EXPR(PartitionsByExprRequest.class, 
PartitionsByExprResult.class),
+PARTITIONS_SPEC_BY_EXPR(PartitionsByExprRequest.class, 
PartitionsSpecByExprResult.class);
+
+private final Class keyClass;
+private final Class valueClass;
+
+KeyType(Class keyClass, Class valueClass) {
+  this.keyClass = keyClass;
+  this.valueClass = valueClass;
+}
+  }
+
+  /**
+   * CacheKey objects are used as key for the cache.
+   */
+  public static class CacheKey{
+KeyType IDENTIFIER;
+Object obj;
+
+public CacheKey(KeyType IDENTIFIER, Object obj) {
+  this.IDENTIFIER = IDENTIFIER;
+  this.obj = obj;
+}
+
+@Override
+public boolean equals(Object o) {
+  if (this == o) {
+return true;
+  }
+  if (o == null || getClass() != o.getClass()) {
+return false;
+  }
+  CacheKey cacheKey = (CacheKey) o;
+  return IDENTIFIER == cacheKey.IDENTIFIER &&
+  Objects.equals(obj, cacheKey.obj);
+}
+
+@Override
+public int hashCode() {
+  return Objects.hash(IDENTIFIER, obj);
+}
+  }
+
+  private static int getWeight(CacheKey key, Object val) {
+ObjectEstimator keySizeEstimator = sizeEstimator.get(key.getClass());
+ObjectEstimator valSizeEstimator = 
sizeEstimator.get(key.IDENTIFIER.valueClass);
+int keySize = keySizeEstimator.estimate(key, sizeEstimator);
+int valSize = valSizeEstimator.estimate(val, sizeEstimator);
+if (LOG.isDebugEnabled()) {
+  LOG.debug("Cache entry weight - key: {}, value: {}, total: {}", keySize, 
valSize, keySize + valSize);
+}
+return keySize + valSize;
+  }
+
+  private Object load(CacheKey key) {
+try {
+  return getResultObject(key);
+} catch (TException e) {
+  throw ne

[GitHub] [hive] HunterL commented on pull request #1324: HIVE-23939: SharedWorkOptimizer: take the union of columns in mergeable TableScans

2020-07-30 Thread GitBox


HunterL commented on pull request #1324:
URL: https://github.com/apache/hive/pull/1324#issuecomment-665824148


   So looking through the Q tests (specifically 
`auto_join_reordering_values.q.out`) we have
   `explain select s.s_store_sk from store_n0 s join store_sales_n0 ss on 
(s.s_store_sk = ss.ss_store_sk) join store_n0 s1 on (s1.s_store_sk = 
ss.ss_store_sk) where s.s_floor_space > 1000`
   
   In the explain plan this produces
   `filterExpr: (((s_floor_space > 1000) and s_store_sk is not null) or 
s_store_sk is not null)`
   This filter expression can be reduced down to `s_store_sk is not null` 
   e.g. ([P && Q] || Q) => Q
   This isn't correct since its going to ignore the `where s.s_floor_space > 
1000`
   
   It seems to me that the correct behavior would be to either...
   1. Continue to combine them naively but with an `and` instead of an `or` and 
allow something downstream to optimize it out.
   e.g. ([P && Q] && Q) which will get reduced down to (P && Q)
   2. Optimize the statement here so that when they are combined you just get 
(P && Q)
   
   There are similar issues in other Q tests but I chose this one for 
simplicity, if this isn't clear I'd be happy to clarify more.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pkumarsinha commented on a change in pull request #1329: HIVE-23897 : Create a common Retry Interface for replication

2020-07-30 Thread GitBox


pkumarsinha commented on a change in pull request #1329:
URL: https://github.com/apache/hive/pull/1329#discussion_r462125753



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -617,6 +617,22 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Name of the source cluster for the replication."),
 REPL_TARGET_CLUSTER_NAME("hive.repl.target.cluster.name", null,
 "Name of the target cluster for the replication."),
+REPL_RETRY_INTIAL_DELAY("hive.repl.retry.initial.delay", "60s",
+  new TimeValidator(TimeUnit.SECONDS),
+  "Initial Delay before retry starts."),
+REPL_RETRY_BACKOFF_COEFFICIENT("hive.repl.retry.backoff.coefficient", 1.2f,
+  "The backoff coefficient for exponential retry delay between retries. " +
+"Previous Delay * Backoff Coefficient will determine the next retry 
interval"),
+REPL_RETRY_JITTER("hive.repl.retry.jitter", "30s", new 
TimeValidator(TimeUnit.SECONDS),
+  "A random jitter to be applied to avoid all retries happening at the 
same time."),
+
REPL_RETRY_MAX_DELAY_BETWEEN_RETRIES("hive.repl.retry.max.delay.between.retries",
 "60m",
+  new TimeValidator(TimeUnit.MINUTES),
+  "Maximum allowed retry delay in seconds after including exponential 
backoff. " +

Review comment:
   Default is in minutes but the comment is in second. Should we change  
this?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/atlas/RetryingClientTimeBased.java
##
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl.atlas;
+
+import com.sun.jersey.api.client.UniformInterfaceException;
+import org.apache.atlas.AtlasServiceException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Random;
+import java.util.concurrent.Callable;
+
+/**
+ * Implement retry logic for service calls.
+ */
+public class RetryingClientTimeBased {
+  private static long MINIMUM_DELAY_IN_SEC = 1;
+  private static final Logger LOG = 
LoggerFactory.getLogger(RetryingClientTimeBased.class);
+  private static final String ERROR_MESSAGE_NO_ENTITIES = "no entities to 
create/update";
+  private static final String ERROR_MESSAGE_IN_PROGRESS = "import or export is 
in progress";
+  private static final String ATLAS_ERROR_CODE_IMPORT_EMPTY_ZIP = "empty ZIP 
file";
+  protected long totalDurationInSeconds;
+  protected long initialDelayInSeconds;
+  protected long maxRetryDelayInSeconds;
+  protected double backOff;
+  protected int maxJitterInSeconds;
+
+  protected  T invokeWithRetry(Callable func, T defaultReturnValue) 
throws Exception {
+long startTime = System.currentTimeMillis();
+long delay = this.initialDelayInSeconds;
+while (elapsedTimeInSeconds(startTime) + delay > 
this.totalDurationInSeconds) {
+  try {
+LOG.debug("Retrying method: {}", func.getClass().getName(), null);
+return func.call();
+  } catch (Exception e) {
+if (processImportExportLockException(e, delay)) {
+  //retry case. compute next sleep time
+  delay = getNextDelay(delay);
+  continue;
+}
+if (processInvalidParameterException(e)) {
+  return null;
+}
+LOG.error(func.getClass().getName(), e);
+throw new Exception(e);
+  }
+}
+return defaultReturnValue;
+  }
+
+  private long getNextDelay(long currentDelay) {
+if (currentDelay <= 0) { // in case initial delay was set to 0.
+  currentDelay = MINIMUM_DELAY_IN_SEC;
+}
+

Review comment:
   nit:can consider formatting this file

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/util/Retryable.java
##
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the Licens

[GitHub] [hive] kasakrisz commented on pull request #1324: HIVE-23939: SharedWorkOptimizer: take the union of columns in mergeable TableScans

2020-07-30 Thread GitBox


kasakrisz commented on pull request #1324:
URL: https://github.com/apache/hive/pull/1324#issuecomment-666155487


   @HunterL
   Thanks for reviewing this patch.
   The expression
   ```
   filterExpr: (((s_floor_space > 1000) and s_store_sk is not null) or 
s_store_sk is not null)
   ```
   is a result of merging two `TableScanOperator`s. Both of them are scanning 
the same table: `alias: s` but they had different `filterExpr`.
   TS1: ((s_floor_space > 1000) and s_store_sk is not null)
   TS2: s_store_sk is not null
   
   SharedWorkOptimizer naively combined the filter expressions using `or` 
because we need the union of the records produced by both TS. You are right in 
this particular case the filter expression could be reduced to `s_store_sk is 
not null`
   
   The new TS has two children
   ```
   TableScan
 alias: s
 filterExpr: (((s_floor_space > 1000) and s_store_sk is not null) or 
s_store_sk is not null) (type: boolean)
 Filter Operator
   predicate: ((s_floor_space > 1000) and s_store_sk is not null) (type: 
boolean)
   ...
 Filter Operator
   predicate: s_store_sk is not null (type: boolean)
   ...
   ```
   both of them are `Filter operators` which are the root of subtrees to 
broadcast the proper subset of records to each reducer edge (Reducer 2 and 
Reducer 3)
   
   If `and` were used for combining the filter expressions of TS operators the 
branch which does not have the filter `s_floor_space > 1000` would loose a 
subset of records.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] github-actions[bot] closed pull request #1028: HIVE-23519 : Read Ranger Configs from Classpath

2020-07-30 Thread GitBox


github-actions[bot] closed pull request #1028:
URL: https://github.com/apache/hive/pull/1028


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] dengzhhu653 closed pull request #1033: HIVE-23546: Skip authorization when user is a superuser

2020-07-30 Thread GitBox


dengzhhu653 closed pull request #1033:
URL: https://github.com/apache/hive/pull/1033


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvary commented on a change in pull request #1221: HIVE-23786: HMS server side filter with Ranger

2020-07-30 Thread GitBox


pvary commented on a change in pull request #1221:
URL: https://github.com/apache/hive/pull/1221#discussion_r462725998



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/HiveMetaStoreAuthorizer.java
##
@@ -312,5 +558,9 @@ private String getCurrentUser() {
   private String getCurrentUser(HiveMetaStoreAuthorizableEvent 
authorizableEvent) {
 return authorizableEvent.getAuthzContext().getUGI().getShortUserName();
   }
+
+  private UserGroupInformation getUGI() throws IOException {
+  return UserGroupInformation.getCurrentUser();

Review comment:
   Too many years seeing Hive code. Basically it hurts my eyes :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] mustafaiman opened a new pull request #1337: HIVE-23952:

2020-07-30 Thread GitBox


mustafaiman opened a new pull request #1337:
URL: https://github.com/apache/hive/pull/1337


   Change-Id: If398aba0130f28c17d18340c20ba681c278f
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org