David Mollitor created ORC-1063: ----------------------------------- Summary: Avoid ORC Reader Max Length Confusion Key: ORC-1063 URL: https://issues.apache.org/jira/browse/ORC-1063 Project: ORC Issue Type: Improvement Components: Java Affects Versions: 1.7.0 Reporter: David Mollitor Assignee: David Mollitor
I just came across this confusion in the wild (i.e. production system). {code:java|title=ReaderImpl.java} @Override public String toString() { StringBuilder buffer = new StringBuilder(); buffer.append("ORC Reader("); buffer.append(path); if (maxLength != -1) { buffer.append(", "); buffer.append(maxLength); } buffer.append(")"); return buffer.toString(); } {code} {code:java|title=OrcConf.java} MAX_FILE_LENGTH("orc.max.file.length", "orc.max.file.length", Long.MAX_VALUE, "The maximum size of the file to read for finding the file tail. This\n" + "is primarily used for streaming ingest to read intermediate\n" + "footers while the file is still open"), {code} https://github.com/apache/orc/blob/883aae8757257a8314c0ece07e5ef0238600717c/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L1107-L1109 There seems to be some confusion here about how to set this value to "there is no maximum value." The configuration denotes {{MAX_VALUE}} as having no value, but the {{toString()}} code is expecting "no maximum value" to be equal to -1. I came across this because I saw some logging that indicated that I had a file that was of length ~9000PB. This did not make any sense and was confusing. I suggest changing this to be any value less than 0 denotes "no maximum" and to use a Java {{Optional}} to avoid this confusion again. -- This message was sent by Atlassian Jira (v8.20.1#820001)