[ https://issues.apache.org/jira/browse/DRILL-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001904#comment-16001904 ]
Paul Rogers commented on DRILL-5489: ------------------------------------ While we are at it, we might as well fix another bit of bogus code: {code} for(Integer i : columnIds){ maxField = 0; maxField = Math.max(maxField, i); ... } {code} The code to compute maxField simply uses the last value, not the maximum value. This will be thrown off by a query of the form: {code} SELECT columns[20], columns[1] FROM ... {code} This will muck up the text reader. The text reader "shortcuts" the rest of the line if it need not read any more fields. In {{TextReader.parseRecord()}}: {code} earlyTerm = ! parseField(); ... if (earlyTerm) { if (ch != newLine) { input.skipLines(1); } break; } {code} And: {code} private final boolean parseField() throws IOException { ... return output.endField(); {code} And, in {{RepeatedVarCharOutput.endField()}}: {code} public boolean endField() { ... return fieldIndex < maxField; } {code} The only reason that this does not cause data loss is the insertion of an unnecessary sort earlier in the constructor: {code} Collections.sort(columnIds); ... for (Integer i : columnIds){ {code} We can remove the sort and compute the maxField correct and get the same result with simpler code. > Unprotected array access in RepeatedVarCharOutput ctor > ------------------------------------------------------ > > Key: DRILL-5489 > URL: https://issues.apache.org/jira/browse/DRILL-5489 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.10.0 > Reporter: Paul Rogers > Priority: Minor > > Suppose a user runs a query of form: > {code} > SELECT columns[70000] FROM `dfs`.`mycsv.csv` > {code} > Internally, this will create a {{PathSegment}} to represent the selected > column. This is passed into the {{RepeatedVarCharOutput}} constructor where > it is used to set a flag in an array of 64K booleans. But, while the code is > very diligent of making sure that the column name is "columns" and that the > path segment is an array, it does not check the array value. Instead: > {code} > for(Integer i : columnIds){ > ... > fields[i] = true; > } > {code} > We need to add a bounds check to reject array indexes that are not valid: > negative or above 64K. It may be that the code further up the hierarchy does > the checks. But, if so, it should do the other checks as well. Leaving the > checks incomplete is confusing. > The result: > {code} > Exception (no rows returned): > org.apache.drill.common.exceptions.UserRemoteException: > SYSTEM ERROR: ArrayIndexOutOfBoundsException: 70000 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)