[GitHub] [drill] jnturton commented on issue #2421: ValueVectors replacement

GitBox Tue, 04 Jan 2022 04:04:50 -0800


jnturton commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1004751568



   Paul Rogers wrote:
   
   One last note. Let's assume we wanted to adopt the row-based format (or, the 
myths being strong, we want to adopt Arrow.) How would we go about it?
   
   The "brute force" approach is to rewrite all the operators. Must deal with 
low-level vector code, so we'd rewrite that with low-level row (or Arrow) code. 
Since we can't really test until all operators are converted, we'd have to do 
the entire conversion in one huge effort. Then, we get to debug. I hope this 
approach is setting off alarm bells: it is high cost and high risk. This is why 
Drill never seriously entertained the change.
   
   But, there is another solution. The scan readers all used to work directly 
with vectors. (Parquet still does.) Because of the memory reasons explained 
above, we converted most of them to use EVF. As a result, we could swap vectors 
for row pages (or Arrow) by changing the low-level code. Readers would be 
blissfully ignorant of such changes because the higher-level abstractions would 
be unchanged.
   
   So, a more sane way to approach a change of in-memory representations is to 
first convert the other operators to use an EVF-like approach. (EVF for writing 
new batches, a "Result Set Loader" for reading exiting batches.) Such a change 
can be done gradually, operator-by-operator, and is fully compatible with 
other, non-converted operators. No big bang.
   
   Once everything is upgraded to EVF, then we can swap out the in-memory 
format. Maybe try Arrow. Try a row-based format. Run tests. Pick the winner.
   
   This is *not* a trivial exercise, but it is doable over time, if we see 
value and can muster the resources.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [drill] jnturton commented on issue #2421: ValueVectors replacement

Reply via email to