GitHub user paul-rogers opened a pull request:
https://github.com/apache/drill/pull/1161
DRILL-6230: Extend row set readers to handle hyper vectors
The current row set readers have incomplete support for hyper-vectors. To
add full support, we need an interface that supports either single batches or
hyper batches. Accessing vectors in hyper batches differs depending on whether
the vector is at the top level or is nested. See this post for details. Also
includes a simpler reader template: replaces the original three classes with
one, in parallel with the writers.
Key changes:
* Refactor the readers to generate just the required reader, then build up
the optional and repeated readers as layers on top of the generated reader.
This is the same structure that the writers already use.
* Add and test support for hyper-vectors.
* Extend the existing "vector accessor" abstraction to fully support the
highly complex process of locating nested vectors (those within a map or union)
in a hyper-batch.
* Introduce the idea of a "null state" abstraction to handle the messy null
handling in unions and repeated lists.
* Modifies tests as needed for the new internal format of vector readers.
To keep the PR from getting overly large, this PR strips out the actual
union and list support. That support will be added in a future PR. Similarly,
there are matching changes to writers that will also be done in a separate PR.
Other minor changes:
* Revises the previous utility PR. In some cases, it turns out to be
cleaner to use a separate `mapValue()` function instead of `objArray()`, even
though both produce an object array. Calling it `mapValue()` makes it a bit
clearer what we're trying to accomplish.
This PR is not needed for Drill 1.13; it can go into Drill 1.14.
See [this
post](https://github.com/paul-rogers/drill/wiki/Batch-Handling-Upgrades) for
details of the end-state toward which this PR is one step.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/paul-rogers/drill DRILL-6230
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/1161.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1161
----
commit e9891c561088d2c79ab1758dc857a8a52ec253ac
Author: Paul Rogers <progers@...>
Date: 2018-03-11T07:43:36Z
Accessor revisions
commit 6f6e3eb803793d71a5e8dba8362737bac66d923c
Author: Paul Rogers <progers@...>
Date: 2018-03-11T22:41:42Z
Merge of exec row set readers & tests
commit 65cd6205ea8e85ac4e001634ffa24268a57ce273
Author: Paul Rogers <progers@...>
Date: 2018-03-12T00:23:35Z
Fixed tests to remove work not in this PR
----
---