[ 
https://issues.apache.org/jira/browse/DRILL-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001904#comment-16001904
 ] 

Paul Rogers commented on DRILL-5489:
------------------------------------

While we are at it, we might as well fix another bit of bogus code:

{code}
        for(Integer i : columnIds){
          maxField = 0;
          maxField = Math.max(maxField, i);
          ...
        }
{code}

The code to compute maxField simply uses the last value, not the maximum value. 
This will be thrown off by a query of the form:

{code}
SELECT columns[20], columns[1] FROM ...
{code}

This will muck up the text reader. The text reader "shortcuts" the rest of the 
line if it need not read any more fields. In {{TextReader.parseRecord()}}:

{code}
        earlyTerm = ! parseField();
        ...
        if (earlyTerm) {
          if (ch != newLine) {
            input.skipLines(1);
          }
          break;
        }
{code}

And:

{code}
  private final boolean parseField() throws IOException {
      ...
      return output.endField();
{code}

And, in {{RepeatedVarCharOutput.endField()}}:

{code}
  public boolean endField() {
    ...
    return fieldIndex < maxField;
  }
{code}

The only reason that this does not cause data loss is the insertion of an 
unnecessary sort earlier in the constructor:

{code}
        Collections.sort(columnIds);
      ...
        for (Integer i : columnIds){
{code}

We can remove the sort and compute the maxField correct and get the same result 
with simpler code.

> Unprotected array access in RepeatedVarCharOutput ctor
> ------------------------------------------------------
>
>                 Key: DRILL-5489
>                 URL: https://issues.apache.org/jira/browse/DRILL-5489
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> Suppose a user runs a query of form:
> {code}
> SELECT columns[70000] FROM `dfs`.`mycsv.csv`
> {code}
> Internally, this will create a {{PathSegment}} to represent the selected 
> column. This is passed into the {{RepeatedVarCharOutput}} constructor where 
> it is used to set a flag in an array of 64K booleans. But, while the code is 
> very diligent of making sure that the column name is "columns" and that the 
> path segment is an array, it does not check the array value. Instead:
> {code}
>         for(Integer i : columnIds){
>           ...
>           fields[i] = true;
>         }
> {code}
> We need to add a bounds check to reject array indexes that are not valid: 
> negative or above 64K. It may be that the code further up the hierarchy does 
> the checks. But, if so, it should do the other checks as well. Leaving the 
> checks incomplete is confusing.
> The result:
> {code}
> Exception (no rows returned): 
> org.apache.drill.common.exceptions.UserRemoteException: 
> SYSTEM ERROR: ArrayIndexOutOfBoundsException: 70000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to