stiga-huang commented on code in PR #1087:
URL: https://github.com/apache/orc/pull/1087#discussion_r852537049


##########
c++/src/Reader.cc:
##########
@@ -1186,41 +1186,46 @@ namespace orc {
                                            uint64_t currentRowInStripe,
                                            uint64_t rowsInCurrentStripe,
                                            uint64_t rowIndexStride,
-                                           const std::vector<bool>& 
includedRowGroups) {
+                                           const std::vector<uint64_t>& 
nextSkippedRows) {
     // In case of PPD, batch size should be aware of row group boundaries. If 
only a subset of row
     // groups are selected then marker position is set to the end of range 
(subset of row groups
     // within stripe).
     uint64_t endRowInStripe = rowsInCurrentStripe;
-    if (!includedRowGroups.empty()) {
-      endRowInStripe = currentRowInStripe;
-      uint32_t rg = static_cast<uint32_t>(currentRowInStripe / rowIndexStride);
-      for (; rg < includedRowGroups.size(); ++rg) {

Review Comment:
   Ah, this is a much simpler fix! Maybe the current PR is just better than 
this in some corner cases, e.g. small RowGroup stride + large requestedSize.
   
   I redo the experiments and they have similar improvements:
   
   || Original | Current PR | New approach |
   -- | -- | -- | --
   Time taken (s) | 0.720241394 | 0.514005243 | 0.517543660
   task-clock (msec) | 719.469567 | 513.202998 | 516.743815
   context-switches | 0 | 0 | 0
   cpu-migrations | 0 | 0 | 0
   page-faults | 3,209 | 3,216 | 3,210
   cycles | 3,184,577,093 | 2,309,953,292 | 2,257,880,102
   instructions | 8,725,559,709 | 5,989,406,509 | 5,993,877,602
   branches | 1,143,366,754 | 752,891,327 | 753,289,704
   branch-misses | 12,242,784 | 11,990,373 | 12,041,485
   
   I'm ok if we choose the simpler fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to