rbalamohan commented on a change in pull request #1072:
URL: https://github.com/apache/orc/pull/1072#discussion_r836922547



##########
File path: java/core/src/java/org/apache/orc/OrcConf.java
##########
@@ -194,6 +194,18 @@
   ORC_MAX_DISK_RANGE_CHUNK_LIMIT("orc.max.disk.range.chunk.limit",
       "hive.exec.orc.max.disk.range.chunk.limit",
     Integer.MAX_VALUE - 1024, "When reading stripes >2GB, specify max limit 
for the chunk size."),
+  ORC_MIN_DISK_SEEK_SIZE("orc.min.disk.seek.size",
+                                 "hive.exec.orc.min.disk.seek.size",
+                                 0,
+                         "When determining contiguous reads, gaps within this 
size are "
+                         + "read contiguously and not seeked. Default value of 
zero disables this "
+                         + "optimization"),
+  ORC_MIN_DISK_SEEK_SIZE_TOLERANCE("orc.min.disk.seek.size.tolerance",

Review comment:
       Thanks for sharing the patch. AWS S3 connectors by default has readahead 
(mostly set to 64 or 128KB). So in a way, the data is read in addition to what 
is requested for. 
   1. Would this patch be different from that?
   2. There can be cases when this could be reading more than necessary and 
throwing off the read bytes later. Would that cause perf penalties?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to