Re: Will Spark-SQL support vectorized query engine someday?

Xuelin Cao Tue, 20 Jan 2015 00:31:44 -0800

Thanks, Reynold

      Regarding the "lower hanging fruits", can you give me some example?
Where can I find them in JIRA?



On Tue, Jan 20, 2015 at 3:55 PM, Reynold Xin <[email protected]> wrote:

> It will probably eventually make its way into part of the query engine,
> one way or another. Note that there are in general a lot of other lower
> hanging fruits before you have to do vectorization.
>
> As far as I know, Hive doesn't really have vectorization because the
> vectorization in Hive is simply writing everything in small batches, in
> order to avoid the virtual function call overhead, and hoping the JVM can
> unroll some of the loops. There is no SIMD involved.
>
> Something that is pretty useful, which isn't exactly from vectorization
> but comes from similar lines of research, is being able to push predicates
> down into the columnar compression encoding. For example, one can turn
> string comparisons into integer comparisons. These will probably give much
> larger performance improvements in common queries.
>
>
> On Mon, Jan 19, 2015 at 6:27 PM, Xuelin Cao <[email protected]>
> wrote:
>
>> Hi,
>>
>>      Correct me if I were wrong. It looks like, the current version of
>> Spark-SQL is *tuple-at-a-time* module. Basically, each time the physical
>> operator produces a tuple by recursively call child->execute .
>>
>>      There are papers that illustrate the benefits of vectorized query
>> engine. And Hive-Stinger also embrace this style.
>>
>>      So, the question is, will Spark-SQL give a support to vectorized
>> query
>> execution someday?
>>
>>      Thanks
>>
>
>

Re: Will Spark-SQL support vectorized query engine someday?

Reply via email to