pavibhai commented on a change in pull request #1072: URL: https://github.com/apache/orc/pull/1072#discussion_r837606381
########## File path: java/core/src/java/org/apache/orc/OrcConf.java ########## @@ -194,6 +194,18 @@ ORC_MAX_DISK_RANGE_CHUNK_LIMIT("orc.max.disk.range.chunk.limit", "hive.exec.orc.max.disk.range.chunk.limit", Integer.MAX_VALUE - 1024, "When reading stripes >2GB, specify max limit for the chunk size."), + ORC_MIN_DISK_SEEK_SIZE("orc.min.disk.seek.size", + "hive.exec.orc.min.disk.seek.size", + 0, + "When determining contiguous reads, gaps within this size are " + + "read contiguously and not seeked. Default value of zero disables this " + + "optimization"), + ORC_MIN_DISK_SEEK_SIZE_TOLERANCE("orc.min.disk.seek.size.tolerance", Review comment: > Would this patch be different from that? I can see the following differences: * In this case the decision of reading extra bytes is based on the read plan in ORC as compared to a simple read ahead * This can additionally be tweaked to other FS by configuring these values from ORC > There can be cases when this could be reading more than necessary and throwing off the read bytes later. Would that cause perf penalties? There will be a trade-off between memory and cpu, the option to both read the extra bytes and drop the extra bytes is configurable allowing one to turn both or one of them off as the need demands. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org