Paul Rogers created DRILL-5491:
----------------------------------

             Summary: NPE when reading a CSV file, with headers, but blank 
header line
                 Key: DRILL-5491
                 URL: https://issues.apache.org/jira/browse/DRILL-5491
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.8.0
            Reporter: Paul Rogers


See DRILL-5490 for background.

Try this unit test case:

{code}
    FixtureBuilder builder = ClusterFixture.builder()
        .maxParallelization(1);

    try (ClusterFixture cluster = builder.build();
         ClientFixture client = cluster.clientFixture()) {
      TextFormatConfig csvFormat = new TextFormatConfig();
      csvFormat.fieldDelimiter = ',';
      csvFormat.skipFirstLine = false;
      csvFormat.extractHeader = true;
      cluster.defineWorkspace("dfs", "data", "/tmp/data", "csv", csvFormat);
      String sql = "SELECT * FROM `dfs.data`.`csv/test7.csv`";
      client.queryBuilder().sql(sql).printCsv();
    }
  }
{code}

The test can also be run as a query using your favorite client.

Using this input file:

{code}

a,b,c
d,e,f
{code}

(The first line is blank.)

The following is the result:

{code}
Exception (no rows returned): 
org.apache.drill.common.exceptions.UserRemoteException: 
SYSTEM ERROR: NullPointerException
{code}

The {{RepeatedVarCharOutput}} class tries (but fails for the reasons outlined 
in DRILL-5490) to detect this case.

The code crashes here in {{CompliantTextRecordReader.extractHeader()}}:

{code}
    String [] fieldNames = ((RepeatedVarCharOutput)hOutput).getTextOutput();
{code}

Because of bad code in {{RepeatedVarCharOutput.getTextOutput()}}:

{code}
  public String [] getTextOutput () throws ExecutionSetupException {
    if (recordCount == 0 || fieldIndex == -1) {
      return null;
    }

    if (this.recordStart != characterData) {
      throw new ExecutionSetupException("record text was requested before 
finishing record");
    }
{code}

Since there is no text on the line, special code elsewhere (see DRILL-5490) 
elects not to increment the {{recordCount}}.  (BTW: {{recordCount}} is the 
total across-batch count, probably the in-batch count, {{batchIndex}}, was 
wanted here.) Since the count is zero, we return null.

But, if the author probably thought we'd get a zero-length record, and the 
if-statement throws an exception in this case. But, see DRILL-5490 about why 
this code does not actually work.

The result is one bug (not incrementing the record count), triggering another 
(returning a null), which masks a third ({{recordStart}} is not set correctly 
so the exception would not be thrown.)

All that bad code is just fun and games until we get an NPE, however.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to