wgtmac commented on code in PR #1087:
URL: https://github.com/apache/orc/pull/1087#discussion_r852538936


##########
c++/src/Reader.cc:
##########
@@ -1186,41 +1186,46 @@ namespace orc {
                                            uint64_t currentRowInStripe,
                                            uint64_t rowsInCurrentStripe,
                                            uint64_t rowIndexStride,
-                                           const std::vector<bool>& 
includedRowGroups) {
+                                           const std::vector<uint64_t>& 
nextSkippedRows) {
     // In case of PPD, batch size should be aware of row group boundaries. If 
only a subset of row
     // groups are selected then marker position is set to the end of range 
(subset of row groups
     // within stripe).
     uint64_t endRowInStripe = rowsInCurrentStripe;
-    if (!includedRowGroups.empty()) {
-      endRowInStripe = currentRowInStripe;
-      uint32_t rg = static_cast<uint32_t>(currentRowInStripe / rowIndexStride);
-      for (; rg < includedRowGroups.size(); ++rg) {

Review Comment:
   Thanks for the further investigation! I am OK with either approach. Will it 
do better if we break the loop in the computeBatchSize() even in your fix?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to