rbalamohan commented on a change in pull request #1072: URL: https://github.com/apache/orc/pull/1072#discussion_r836922547
########## File path: java/core/src/java/org/apache/orc/OrcConf.java ########## @@ -194,6 +194,18 @@ ORC_MAX_DISK_RANGE_CHUNK_LIMIT("orc.max.disk.range.chunk.limit", "hive.exec.orc.max.disk.range.chunk.limit", Integer.MAX_VALUE - 1024, "When reading stripes >2GB, specify max limit for the chunk size."), + ORC_MIN_DISK_SEEK_SIZE("orc.min.disk.seek.size", + "hive.exec.orc.min.disk.seek.size", + 0, + "When determining contiguous reads, gaps within this size are " + + "read contiguously and not seeked. Default value of zero disables this " + + "optimization"), + ORC_MIN_DISK_SEEK_SIZE_TOLERANCE("orc.min.disk.seek.size.tolerance", Review comment: Thanks for sharing the patch. AWS S3 connectors by default has readahead (mostly set to 64 or 128KB). So in a way, the data is read in addition to what is requested for. 1. Would this patch be different from that? 2. There can be cases when this could be reading more than necessary and throwing off the read bytes later. Would that cause perf penalties? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org