Thanks, Reynold Regarding the "lower hanging fruits", can you give me some example? Where can I find them in JIRA?
On Tue, Jan 20, 2015 at 3:55 PM, Reynold Xin <r...@databricks.com> wrote: > It will probably eventually make its way into part of the query engine, > one way or another. Note that there are in general a lot of other lower > hanging fruits before you have to do vectorization. > > As far as I know, Hive doesn't really have vectorization because the > vectorization in Hive is simply writing everything in small batches, in > order to avoid the virtual function call overhead, and hoping the JVM can > unroll some of the loops. There is no SIMD involved. > > Something that is pretty useful, which isn't exactly from vectorization > but comes from similar lines of research, is being able to push predicates > down into the columnar compression encoding. For example, one can turn > string comparisons into integer comparisons. These will probably give much > larger performance improvements in common queries. > > > On Mon, Jan 19, 2015 at 6:27 PM, Xuelin Cao <xuelincao2...@gmail.com> > wrote: > >> Hi, >> >> Correct me if I were wrong. It looks like, the current version of >> Spark-SQL is *tuple-at-a-time* module. Basically, each time the physical >> operator produces a tuple by recursively call child->execute . >> >> There are papers that illustrate the benefits of vectorized query >> engine. And Hive-Stinger also embrace this style. >> >> So, the question is, will Spark-SQL give a support to vectorized >> query >> execution someday? >> >> Thanks >> > >