[GitHub] [hive] pvary commented on a change in pull request #1339: HIVE-23956: Delete delta fileIds should be pushed execution

GitBox Thu, 30 Jul 2020 22:05:15 -0700


pvary commented on a change in pull request #1339:
URL: https://github.com/apache/hive/pull/1339#discussion_r463406580




##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##########
@@ -1574,45 +1576,46 @@ public int compareTo(CompressedOwid other) {
       this.orcSplit = orcSplit;
 
       try {
-        final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
-        if (deleteDeltaDirs.length > 0) {
+        if (orcSplit.getDeltas().size() > 0) {
           AcidOutputFormat.Options orcSplitMinMaxWriteIds =
               AcidUtils.parseBaseOrDeltaBucketFilename(orcSplit.getPath(), 
conf);
           int totalDeleteEventCount = 0;
-          for (Path deleteDeltaDir : deleteDeltaDirs) {
-            if (!isQualifiedDeleteDeltaForSplit(orcSplitMinMaxWriteIds, 
deleteDeltaDir)) {
-              continue;
-            }
-            Path[] deleteDeltaFiles = 
OrcRawRecordMerger.getDeltaFiles(deleteDeltaDir, bucket,
-                new OrcRawRecordMerger.Options().isCompacting(false), null);
-            for (Path deleteDeltaFile : deleteDeltaFiles) {
-              try {
-                ReaderData readerData = getOrcTail(deleteDeltaFile, conf, 
cacheTag);
-                OrcTail orcTail = readerData.orcTail;
-                if (orcTail.getFooter().getNumberOfRows() <= 0) {
-                  continue; // just a safe check to ensure that we are not 
reading empty delete files.
-                }
-                OrcRawRecordMerger.KeyInterval deleteKeyInterval = 
findDeleteMinMaxKeys(orcTail, deleteDeltaFile);
-                if (!deleteKeyInterval.isIntersects(keyInterval)) {
-                  // If there is no intersection between data and delete 
delta, do not read delete file
-                  continue;
-                }
-                // Reader can be reused if it was created before for getting 
orcTail: mostly for non-LLAP cache cases.
-                // For LLAP cases we need to create it here.
-                Reader deleteDeltaReader = readerData.reader != null ? 
readerData.reader :
-                    OrcFile.createReader(deleteDeltaFile, 
OrcFile.readerOptions(conf));
-                totalDeleteEventCount += deleteDeltaReader.getNumberOfRows();
-                DeleteReaderValue deleteReaderValue = new 
DeleteReaderValue(deleteDeltaReader,
-                    deleteDeltaFile, readerOptions, bucket, validWriteIdList, 
isBucketedTable, conf,
-                    keyInterval, orcSplit);
-                DeleteRecordKey deleteRecordKey = new DeleteRecordKey();
-                if (deleteReaderValue.next(deleteRecordKey)) {
-                  sortMerger.put(deleteRecordKey, deleteReaderValue);
-                } else {
-                  deleteReaderValue.close();
+          for (AcidInputFormat.DeltaMetaData deltaMetaData : 
orcSplit.getDeltas()) {
+            for (Path deleteDeltaDir : 
deltaMetaData.getPaths(orcSplit.getRootDir())) {
+              if (!isQualifiedDeleteDeltaForSplit(orcSplitMinMaxWriteIds, 
deleteDeltaDir)) {
+                LOG.debug("Skipping delete delta dir {}", deleteDeltaDir);
+                continue;
+              }
+              for (AcidInputFormat.DeltaFileMetaData fileMetaData : 
deltaMetaData.getDeltaFiles()) {
+                Path deleteDeltaFile = fileMetaData.getPath(deleteDeltaDir, 
bucket);
+                try {
+                  ReaderData readerData = getOrcTail(deleteDeltaFile, conf, 
cacheTag, fileMetaData.getFileId(deleteDeltaDir, bucket));
+                  OrcTail orcTail = readerData.orcTail;
+                  if (orcTail.getFooter().getNumberOfRows() <= 0) {
+                    continue; // just a safe check to ensure that we are not 
reading empty delete files.
+                  }
+                  OrcRawRecordMerger.KeyInterval deleteKeyInterval = 
findDeleteMinMaxKeys(orcTail, deleteDeltaFile);
+                  if (!deleteKeyInterval.isIntersects(keyInterval)) {
+                    // If there is no intersection between data and delete 
delta, do not read delete file
+                    continue;
+                  }
+                  // Reader can be reused if it was created before for getting 
orcTail: mostly for non-LLAP cache cases.
+                  // For LLAP cases we need to create it here.
+                  Reader deleteDeltaReader = readerData.reader != null ? 
readerData.reader : OrcFile
+                      .createReader(deleteDeltaFile, 
OrcFile.readerOptions(conf));
+                  totalDeleteEventCount += deleteDeltaReader.getNumberOfRows();
+                  DeleteReaderValue deleteReaderValue =
+                      new DeleteReaderValue(deleteDeltaReader, 
deleteDeltaFile, readerOptions, bucket, validWriteIdList,
+                          isBucketedTable, conf, keyInterval, orcSplit);
+                  DeleteRecordKey deleteRecordKey = new DeleteRecordKey();
+                  if (deleteReaderValue.next(deleteRecordKey)) {
+                    sortMerger.put(deleteRecordKey, deleteReaderValue);
+                  } else {
+                    deleteReaderValue.close();
+                  }
+                } catch (FileNotFoundException fnf) {

Review comment:
       Multitable insersts also uses stmtId when inserting data. Not sure if we 
can insert twice in the same table with a single query....




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

[GitHub] [hive] pvary commented on a change in pull request #1339: HIVE-23956: Delete delta fileIds should be pushed execution

Reply via email to