pavibhai commented on a change in pull request #1072:
URL: https://github.com/apache/orc/pull/1072#discussion_r837606381



##########
File path: java/core/src/java/org/apache/orc/OrcConf.java
##########
@@ -194,6 +194,18 @@
   ORC_MAX_DISK_RANGE_CHUNK_LIMIT("orc.max.disk.range.chunk.limit",
       "hive.exec.orc.max.disk.range.chunk.limit",
     Integer.MAX_VALUE - 1024, "When reading stripes >2GB, specify max limit 
for the chunk size."),
+  ORC_MIN_DISK_SEEK_SIZE("orc.min.disk.seek.size",
+                                 "hive.exec.orc.min.disk.seek.size",
+                                 0,
+                         "When determining contiguous reads, gaps within this 
size are "
+                         + "read contiguously and not seeked. Default value of 
zero disables this "
+                         + "optimization"),
+  ORC_MIN_DISK_SEEK_SIZE_TOLERANCE("orc.min.disk.seek.size.tolerance",

Review comment:
       > Would this patch be different from that?
   
   I can see the following differences:
   * In this case the decision of reading extra bytes is based on the read plan 
in ORC as compared to a simple read ahead
   * This can additionally be tweaked to other FS by configuring these values 
from ORC
   
   > There can be cases when this could be reading more than necessary and 
throwing off the read bytes later. Would that cause perf penalties?
   
   There will be a trade-off between memory and cpu, the option to both read 
the extra bytes and drop the extra bytes is configurable allowing one to turn 
both or one of them off as the need demands.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to