[ https://issues.apache.org/jira/browse/DRILL-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Rogers updated DRILL-5491: ------------------------------- Issue Type: Sub-task (was: Bug) Parent: DRILL-5498 > NPE when reading a CSV file, with headers, but blank header line > ---------------------------------------------------------------- > > Key: DRILL-5491 > URL: https://issues.apache.org/jira/browse/DRILL-5491 > Project: Apache Drill > Issue Type: Sub-task > Affects Versions: 1.8.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > > See DRILL-5490 for background. > Try this unit test case: > {code} > FixtureBuilder builder = ClusterFixture.builder() > .maxParallelization(1); > try (ClusterFixture cluster = builder.build(); > ClientFixture client = cluster.clientFixture()) { > TextFormatConfig csvFormat = new TextFormatConfig(); > csvFormat.fieldDelimiter = ','; > csvFormat.skipFirstLine = false; > csvFormat.extractHeader = true; > cluster.defineWorkspace("dfs", "data", "/tmp/data", "csv", csvFormat); > String sql = "SELECT * FROM `dfs.data`.`csv/test7.csv`"; > client.queryBuilder().sql(sql).printCsv(); > } > } > {code} > The test can also be run as a query using your favorite client. > Using this input file: > {code} > a,b,c > d,e,f > {code} > (The first line is blank.) > The following is the result: > {code} > Exception (no rows returned): > org.apache.drill.common.exceptions.UserRemoteException: > SYSTEM ERROR: NullPointerException > {code} > The {{RepeatedVarCharOutput}} class tries (but fails for the reasons outlined > in DRILL-5490) to detect this case. > The code crashes here in {{CompliantTextRecordReader.extractHeader()}}: > {code} > String [] fieldNames = ((RepeatedVarCharOutput)hOutput).getTextOutput(); > {code} > Because of bad code in {{RepeatedVarCharOutput.getTextOutput()}}: > {code} > public String [] getTextOutput () throws ExecutionSetupException { > if (recordCount == 0 || fieldIndex == -1) { > return null; > } > if (this.recordStart != characterData) { > throw new ExecutionSetupException("record text was requested before > finishing record"); > } > {code} > Since there is no text on the line, special code elsewhere (see DRILL-5490) > elects not to increment the {{recordCount}}. (BTW: {{recordCount}} is the > total across-batch count, probably the in-batch count, {{batchIndex}}, was > wanted here.) Since the count is zero, we return null. > But, if the author probably thought we'd get a zero-length record, and the > if-statement throws an exception in this case. But, see DRILL-5490 about why > this code does not actually work. > The result is one bug (not incrementing the record count), triggering another > (returning a null), which masks a third ({{recordStart}} is not set correctly > so the exception would not be thrown.) > All that bad code is just fun and games until we get an NPE, however. -- This message was sent by Atlassian JIRA (v6.3.15#6346)