I don't know if there is a list, but in general running performance
profiler can identify a lot of things...

On Tue, Jan 20, 2015 at 12:30 AM, Xuelin Cao <xuelincao2...@gmail.com>
wrote:

>
> Thanks, Reynold
>
>       Regarding the "lower hanging fruits", can you give me some example?
> Where can I find them in JIRA?
>
>
> On Tue, Jan 20, 2015 at 3:55 PM, Reynold Xin <r...@databricks.com> wrote:
>
>> It will probably eventually make its way into part of the query engine,
>> one way or another. Note that there are in general a lot of other lower
>> hanging fruits before you have to do vectorization.
>>
>> As far as I know, Hive doesn't really have vectorization because the
>> vectorization in Hive is simply writing everything in small batches, in
>> order to avoid the virtual function call overhead, and hoping the JVM can
>> unroll some of the loops. There is no SIMD involved.
>>
>> Something that is pretty useful, which isn't exactly from vectorization
>> but comes from similar lines of research, is being able to push predicates
>> down into the columnar compression encoding. For example, one can turn
>> string comparisons into integer comparisons. These will probably give much
>> larger performance improvements in common queries.
>>
>>
>> On Mon, Jan 19, 2015 at 6:27 PM, Xuelin Cao <xuelincao2...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>>      Correct me if I were wrong. It looks like, the current version of
>>> Spark-SQL is *tuple-at-a-time* module. Basically, each time the physical
>>> operator produces a tuple by recursively call child->execute .
>>>
>>>      There are papers that illustrate the benefits of vectorized query
>>> engine. And Hive-Stinger also embrace this style.
>>>
>>>      So, the question is, will Spark-SQL give a support to vectorized
>>> query
>>> execution someday?
>>>
>>>      Thanks
>>>
>>
>>
>

Reply via email to