Paul Rogers created DRILL-5612:
----------------------------------

             Summary: Random failure in TestMergeJoinWithSchemaChanges
                 Key: DRILL-5612
                 URL: https://issues.apache.org/jira/browse/DRILL-5612
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.11.0
            Reporter: Paul Rogers


The unit test 
{{org.apache.drill.exec.physical.impl.join.TestMergeJoinWithSchemaChanges#testMissingAndNewColumns}}
 is subject to random failures, perhaps due to changes in file order in readers.

The test builds a number of input files, then executes queries against them. On 
most runs, the output is fine:

{code}
Running 
org.apache.drill.exec.physical.impl.join.TestMergeJoinWithSchemaChanges#testMissingAndNewColumns
/home/.../target/1498606483211-0/mergejoin-schemachanges-left
/home/.../target/1498606483211-1/mergejoin-schemachanges-right
{code}

But, on occasion, the query fails:

{code}
org.apache.drill.exec.physical.impl.join.TestMergeJoinWithSchemaChanges
testMissingAndNewColumns(org.apache.drill.exec.physical.impl.join.TestMergeJoinWithSchemaChanges)
  Time elapsed: 0.569 sec  <<< ERROR!
...: UNSUPPORTED_OPERATION ERROR: Sort doesn't currently support sorts with 
changing schemas

Fragment 0:0

  (org.apache.drill.exec.exception.SchemaChangeException) Sort currently only 
supports a single schema.
    org.apache.drill.exec.physical.impl.sort.SortRecordBatchBuilder.build():152
    org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext():476
...
{code}

The line in the exception above:

{code}
  public void build(VectorContainer outputContainer) throws 
SchemaChangeException {
    outputContainer.clear();
    if (batches.keySet().size() > 1) {
      throw new SchemaChangeException("Sort currently only supports a single 
schema.");
    }
{code}

The above code has not changed in quite some time. The failure is in the 
"legacy" external sort.

Although the external sort does support schema changes, it only does so in the 
form of a union vector, which must be enabled. (Other tests validate that 
schema changes work.)

What is likely happening here is that the sort sometimes sees two files with 
differing schemas, sometimes multiple threads run so that a single sort sees 
only one file. This speculation can be verified by looking at a log file (not 
available in the test run that failed) to see if the scan under the sort read 
more than one file.

Or, perhaps the order of the JSON files matters. Perhaps file order varies 
across machines (since the Linux command to list directories does not guarantee 
order.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to