[GitHub] [orc] pavibhai commented on a change in pull request #1072: ORC-1138

GitBox Tue, 29 Mar 2022 08:22:15 -0700


pavibhai commented on a change in pull request #1072:
URL: https://github.com/apache/orc/pull/1072#discussion_r837606381




##########
File path: java/core/src/java/org/apache/orc/OrcConf.java
##########
@@ -194,6 +194,18 @@
   ORC_MAX_DISK_RANGE_CHUNK_LIMIT("orc.max.disk.range.chunk.limit",
       "hive.exec.orc.max.disk.range.chunk.limit",
     Integer.MAX_VALUE - 1024, "When reading stripes >2GB, specify max limit 
for the chunk size."),
+  ORC_MIN_DISK_SEEK_SIZE("orc.min.disk.seek.size",
+                                 "hive.exec.orc.min.disk.seek.size",
+                                 0,
+                         "When determining contiguous reads, gaps within this 
size are "
+                         + "read contiguously and not seeked. Default value of 
zero disables this "
+                         + "optimization"),
+  ORC_MIN_DISK_SEEK_SIZE_TOLERANCE("orc.min.disk.seek.size.tolerance",

Review comment:
       > Would this patch be different from that?
   
   I can see the following differences:
   * In this case the decision of reading extra bytes is based on the read plan 
in ORC as compared to a simple read ahead
     * We will not request extra if the read does not intend on going there
     * In our tests we are seeing 4MB seems to work well as shown in the 
benchmark results
   * This can additionally be tweaked to other FS by configuring these values 
based on FS
   
   > There can be cases when this could be reading more than necessary and 
throwing off the read bytes later. Would that cause perf penalties?
   
   There will be a trade-off between memory and cpu, the option to both read 
the extra bytes and drop the extra bytes is configurable allowing one to turn 
both or one of them off as the need demands.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [orc] pavibhai commented on a change in pull request #1072: ORC-1138

Reply via email to