Github user paul-rogers commented on the issue:
https://github.com/apache/drill/pull/914
Regarding the use of memory addresses. The only reason to do so is
performance. To show the benefit of using addresses, I reran the
`PerformanceTool` class to test the original code, the code using addresses,
and a version that uses DrillBuf as @parthchandra suggested. I expected to see
that using addresses was a winner. That's not at all what happened.
The code contains a class, `PerformanceTool` that compares the column
writers with the original vector mutators. It loads a vector to 16 MB in size,
repeated 300 times. The following are the run times, in ms.
Vector Type | Original | New w/Address | New w/Drillbuf
------------ | -------- | ------------ | -------------
Required | 5703 | 4034 | 1461
Nullable | 12743 | 3645 | 3411
Repeated | 20430 | 7226 | 2669
Here:
* "Original" column uses the original int vector mutator class.
* "New w/Address" shows the same exercise, using the version of the vector
writers based on a direct memory address.
* "New w/Drillbuf" shows the vector writers, but using the technique Parth
suggested to create "unsafe" methods on the `Drillbuf` class.
The test is run with a pre-allocated vector (no double-and-copy
operations). See `PerformanceTool` for details.
I have no explanation for why the `Drillbuf` version should be faster at
all, let alone far faster; but I'll take it. The latest commit contains the
code after this revision.
So, thank you Parth, you were right again with what turned out to be an
outstanding performance boost.
---