Paul Rogers created DRILL-5491: ---------------------------------- Summary: NPE when reading a CSV file, with headers, but blank header line Key: DRILL-5491 URL: https://issues.apache.org/jira/browse/DRILL-5491 Project: Apache Drill Issue Type: Bug Affects Versions: 1.8.0 Reporter: Paul Rogers
See DRILL-5490 for background. Try this unit test case: {code} FixtureBuilder builder = ClusterFixture.builder() .maxParallelization(1); try (ClusterFixture cluster = builder.build(); ClientFixture client = cluster.clientFixture()) { TextFormatConfig csvFormat = new TextFormatConfig(); csvFormat.fieldDelimiter = ','; csvFormat.skipFirstLine = false; csvFormat.extractHeader = true; cluster.defineWorkspace("dfs", "data", "/tmp/data", "csv", csvFormat); String sql = "SELECT * FROM `dfs.data`.`csv/test7.csv`"; client.queryBuilder().sql(sql).printCsv(); } } {code} The test can also be run as a query using your favorite client. Using this input file: {code} a,b,c d,e,f {code} (The first line is blank.) The following is the result: {code} Exception (no rows returned): org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: NullPointerException {code} The {{RepeatedVarCharOutput}} class tries (but fails for the reasons outlined in DRILL-5490) to detect this case. The code crashes here in {{CompliantTextRecordReader.extractHeader()}}: {code} String [] fieldNames = ((RepeatedVarCharOutput)hOutput).getTextOutput(); {code} Because of bad code in {{RepeatedVarCharOutput.getTextOutput()}}: {code} public String [] getTextOutput () throws ExecutionSetupException { if (recordCount == 0 || fieldIndex == -1) { return null; } if (this.recordStart != characterData) { throw new ExecutionSetupException("record text was requested before finishing record"); } {code} Since there is no text on the line, special code elsewhere (see DRILL-5490) elects not to increment the {{recordCount}}. (BTW: {{recordCount}} is the total across-batch count, probably the in-batch count, {{batchIndex}}, was wanted here.) Since the count is zero, we return null. But, if the author probably thought we'd get a zero-length record, and the if-statement throws an exception in this case. But, see DRILL-5490 about why this code does not actually work. The result is one bug (not incrementing the record count), triggering another (returning a null), which masks a third ({{recordStart}} is not set correctly so the exception would not be thrown.) All that bad code is just fun and games until we get an NPE, however. -- This message was sent by Atlassian JIRA (v6.3.15#6346)