Paul Rogers created DRILL-5490: ---------------------------------- Summary: RepeatedVarCharOutput.recordStart becomes corrupted on realloc Key: DRILL-5490 URL: https://issues.apache.org/jira/browse/DRILL-5490 Project: Apache Drill Issue Type: Bug Affects Versions: 1.10.0 Reporter: Paul Rogers Priority: Minor
The class {{RepeatedVarCharOutput}}, used to read text data into the {{columns}} array, has a member called {{recordStart}} which is supposed to track the memory offset of the start of the record data in the {{columns}} VarChar array. However, the logic for the member is very wrong. First, it is initialized based on an uninitialized variable: {{code}} public void startBatch() { this.recordStart = characterDataOriginal; ... loadVarCharDataAddress(); } private void loadVarCharDataAddress(){ ... this.characterDataOriginal = buf.memoryAddress(); ... } {{code}} Second, the class keeps track of actual memory addresses for the VarChar data. When data would exceed the buffer, memory is reallocated using the following method. But, this method *does not* adjust the {{recordStart}} member, which now points to a bogus memory address: {code} private void expandVarCharData(){ vector.getDataVector().reAlloc(); long diff = characterData - characterDataOriginal; loadVarCharDataAddress(); characterData += diff; } {code} Fortunately, like so many bugs in this class, these bugs are offset by the fact that the variable is never used in a way that cause obvious problems (else these issues would have been found sooner.) The only real use is: {code} @Override public boolean rowHasData() { return recordStart < characterData; } {code} Which, it turns out, is used only at EOF in {{TextReader.parseRecord()}}: {code} } catch(StreamFinishedPseudoException e) { // if we've written part of a field or all of a field, we should send this row. if (fieldsWritten == 0 && !output.rowHasData()) { throw e; } } {code} Thus, the bug will cause a failure only if: * A batch is resized, and * The new address is lower than the original address, and * It is the last batch in the file. Because of the low probability of hitting the bug, the priority is set to Minor. -- This message was sent by Atlassian JIRA (v6.3.15#6346)