Paul Rogers created DRILL-5134:
----------------------------------
Summary: TestMergeJoinWithSchemaChanges throws exception with
paged SV4
Key: DRILL-5134
URL: https://issues.apache.org/jira/browse/DRILL-5134
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.9.0
Reporter: Paul Rogers
Priority: Minor
The {{TestMergeJoinWithSchemaChanges}} test exercises the in-memory merge sort
with union vectors. (Note that union vectors are not fully supported.)
The merge sort creates an SV4 to hold an index into the sorted results. SV4's
have the ability to page results as batches to upstream.
When {{TestMergeJoinWithSchemaChanges}} is run using the "managed" external
sort and union vectors, a downstream operator throws an index out of range
exception. However, when run with the "classic" external sort, no such
exception is thrown.
The difference is that the classic version returns all rows in a single batch,
while the managed version attempted to return rows in a batch of a specified
size.
The paging approach works for tests that do not include union vectors, but
fails for those that do include them.
Modifying the managed version to return all results in a single batch does work.
The problem with this workaround is that there will come a size beyond which
sorted results cannot be returned in a single batch and paging will be
necessary. The sort buffer can, for example, be set to 10G, which is too large
for a single batch. Or, the sort can process more than 64K rows, which is also
too large for a single batch. In those scenarios, union vectors with SV4 will
fail.
Since union vectors are not supported, the workaround described above is used
to get the test to pass. This ticket records the issue for a future time in
which we attempt to support union vectors.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)