wgtmac commented on code in PR #1158:
URL: https://github.com/apache/orc/pull/1158#discussion_r912578816


##########
c++/src/sargs/SargsApplier.cc:
##########
@@ -173,16 +174,23 @@ namespace orc {
     if (!ret) {
       // reset mNextSkippedRows when the current stripe does not satisfy the 
PPD
       mNextSkippedRows.clear();
+      if (mMetrics != nullptr) {
+        mMetrics->EvaluatedRowGroupCount.fetch_add(stripeRowGroupCount);
+      }
     }
     return ret;
   }
 
-  bool SargsApplier::evaluateFileStatistics(const proto::Footer& footer) {
+  bool SargsApplier::evaluateFileStatistics(const proto::Footer& footer,
+                                            uint64_t fileRowGroupCount) {
     if (!mHasEvaluatedFileStats) {
       if (footer.statistics_size() == 0) {
         mFileStatsEvalResult = true;
       } else {
         mFileStatsEvalResult = evaluateColumnStatistics(footer.statistics());
+        if (!mFileStatsEvalResult && mMetrics != nullptr) {
+          mMetrics->EvaluatedRowGroupCount.fetch_add(fileRowGroupCount);

Review Comment:
   The unit of input split is stripe. If the current input split does not 
contain all stripes in the file, it is incorrect to count row groups of the 
whole file.



##########
c++/src/Reader.cc:
##########
@@ -286,6 +287,10 @@ namespace orc {
           lastStripe = i + 1;
         }
       }
+      if (footer->rowindexstride() > 0) {
+        fileRowGroupCount +=

Review Comment:
   I'd suggest to rename fileRowGroupCount to something like numSplitRowGroups, 
 numRequestedRowGroups or numRowGroupsInRange. Then refine it to count stripe 
in range only.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to