stiga-huang commented on code in PR #1073:
URL: https://github.com/apache/orc/pull/1073#discussion_r847858395


##########
c++/src/sargs/SargsApplier.cc:
##########
@@ -122,4 +124,58 @@ namespace orc {
     return mHasSelected;
   }
 
+  bool SargsApplier::evaluateColumnStatistics(
+                                    const PbColumnStatistics& colStats) const {
+    const SearchArgumentImpl * sargs =
+      dynamic_cast<const SearchArgumentImpl *>(mSearchArgument);
+    if (sargs == nullptr) {
+      throw InvalidArgument("Failed to cast to SearchArgumentImpl");
+    }
+
+    const std::vector<PredicateLeaf>& leaves = sargs->getLeaves();
+    std::vector<TruthValue> leafValues(
+      leaves.size(), TruthValue::YES_NO_NULL);
+
+    for (size_t pred = 0; pred != leaves.size(); ++pred) {
+      uint64_t columnId = mFilterColumns[pred];
+      if (columnId != INVALID_COLUMN_ID &&
+          colStats.size() > static_cast<int>(columnId)) {
+        leafValues[pred] = leaves[pred].evaluate(
+          mWriterVersion, colStats.Get(static_cast<int>(columnId)), nullptr);
+      }
+    }
+
+    return isNeeded(mSearchArgument->evaluate(leafValues));
+  }
+
+  bool SargsApplier::evaluateStripeStatistics(
+                            uint64_t rowsInStripe,
+                            const proto::StripeStatistics& stripeStats) {
+    if (stripeStats.colstats_size() == 0) {
+      return true;
+    }
+
+    bool ret = evaluateColumnStatistics(stripeStats.colstats());
+    if (!ret && mRowIndexStride > 0) {

Review Comment:
   I think we can remove these if we add a quick check after invoking 
RowReaderImpl::startNextStrip() (mentioned above). If this is not the last 
stripe, `mRowGroups` will be updated in processing the next stripe.



##########
c++/src/sargs/SargsApplier.cc:
##########
@@ -122,4 +124,58 @@ namespace orc {
     return mHasSelected;
   }
 
+  bool SargsApplier::evaluateColumnStatistics(
+                                    const PbColumnStatistics& colStats) const {

Review Comment:
   I think it should indent by 6 spaces. We should do the same indentation in 
`SargsApplier::evaluateStripeStatistics()`.



##########
c++/src/Reader.cc:
##########
@@ -1034,6 +1038,14 @@ namespace orc {
     rowIndexes.clear();
     bloomFilterIndex.clear();
 
+    // evaluate file statistics if it exists
+    if (sargsApplier && !sargsApplier->evaluateFileStatistics(*footer)) {
+      // skip the entire file
+      currentStripe = lastStripe;

Review Comment:
   We can add a quick path on checking this after invoking startNextStripe() at 
here: 
https://github.com/apache/orc/blob/9d1e4ad40c1760c689d4a7651c8bf7773421d25c/c%2B%2B/src/Reader.cc#L1111
   
   It's not cheap to invoke computeBatchSize() in the following logics. 
RowReaderImpl::next() can return false as what we do here:
   
https://github.com/apache/orc/blob/9d1e4ad40c1760c689d4a7651c8bf7773421d25c/c%2B%2B/src/Reader.cc#L1100-L1109



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to