[
https://issues.apache.org/jira/browse/HIVE-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811796#comment-13811796
]
Eric Hanson commented on HIVE-5397:
-----------------------------------
Hi Brock,
I'm in favor of encapsulation for most code. But this is different because this
is a low-level performance enhancement project that has some research behind
it. The theory behind the vectorized query execution technique that we use was
published in this paper:
Peter Boncz et al., MonetDB/X100: Hyper-Pipelining Query Execution, Proceedings
of the CIDR Conference, 2005.
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=C26BD72358252F6A301DA1FF6E37D44B?doi=10.1.1.324.9516&rep=rep1&type=pdf
Please see the performance numbers in the paper.
State of the art query execution systems like the one in Microsoft SQL Server,
Vectorwise, Vertica, and ParAccel/Redshift (not in any particular order), all
use this strategy or something like it. It's well known in the industry that
this is a place where being architecture-conscious pays big dividends. That
requires some violation of encapsulation.
It is possible that the compiler might do some function inlining for us in the
inner loop of some of the vector "for" loops, but that is too much of a risk
for us in most cases to rely on the compiler here for the most primitive
operations like arithmetic and comparisons. Arguably, using put/get methods to
access columns rather than array access like we use in our VectorExpression
subclasses probably would not lose much perfomance. But we already decided to
use array access to get columns, and it is used in hundreds of places in the
code. I think it is a reasonable choice and not necessary to change it.
-Eric
> VectorizedRowBatch member variables are public.
> -----------------------------------------------
>
> Key: HIVE-5397
> URL: https://issues.apache.org/jira/browse/HIVE-5397
> Project: Hive
> Issue Type: Sub-task
> Reporter: Jitendra Nath Pandey
> Assignee: Jitendra Nath Pandey
>
> VectorizedRowBatch exposes members as public to avoid method call overheads.
> Alternative is to rely on JIT to inline the methods.
--
This message was sent by Atlassian JIRA
(v6.1#6144)