[
https://issues.apache.org/jira/browse/HIVE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13718932#comment-13718932
]
Jitendra Nath Pandey commented on HIVE-4822:
--------------------------------------------
bq. How does explain work with the vectorization engine?
The 'explain' continues to work as before and returns the same plan as in
non-vector mode.
Vectorization executes exactly the same query plan, only the implementation of
the operators and expressions has been changed to run in vectorized fashion.
However, we do plan to enhance 'explain' to also show which operators will be
executed in vectorized mode. We will start working on it very soon and file a
jira.
In current implementation, we don't need the 'explain' annotations on
vectorized UDFs, because the vectorized UDFs are used at run time. In the query
planning stage only row mode UDFs are used, however at query execution time if
vectorization is possible, we switch to corresponding vectorized UDFs. We
adopted this approach to avoid any changes to query planner for vectorization.
bq. Could we somehow hybrid some of our existing UDFS to work from both engines?
We will surely have to support the hybrid approach, as you are suggesting,
for UDFs that users have implemented, even though we will recommend users to
re-implement their UDFs in vectorized fashion. However, for built in hive UDFs,
it will almost always be better to have vectorized implementation for
performance. Eventually, we do want to have vectorized implementation for all
built-in UDFs.
bq. Are we sure that functions that operate on doubles and floats are going to
round exactly the same way?
We have used same underlying java libraries therefore, our results should
match. In our testing we do compare the results with non-vector results to make
sure.
bq. Do we have a wiki page or something where we are keeping track of what is
currently supported using vectorization?
That's a good idea, I agree we should track this so that community is aware.
It will also help and encourage folks to identify areas to contribute.
> implement vectorized math functions
> -----------------------------------
>
> Key: HIVE-4822
> URL: https://issues.apache.org/jira/browse/HIVE-4822
> Project: Hive
> Issue Type: Sub-task
> Affects Versions: vectorization-branch
> Reporter: Eric Hanson
> Assignee: Eric Hanson
> Fix For: vectorization-branch
>
> Attachments: HIVE-4822.1.patch, HIVE-4822.4.patch,
> HIVE-4822.5-vectorization.patch
>
>
> Implement vectorized support for the all the built-in math functions. This
> includes implementing the vectorized operation, and tying it all together in
> VectorizationContext so it runs end-to-end. These functions include:
> round(Col)
> Round(Col, N)
> Floor(Col)
> Ceil(Col)
> Rand(), Rand(seed)
> Exp(Col)
> Ln(Col)
> Log10(Col)
> Log2(Col)
> Log(base, Col)
> Pow(col, p), Power(col, p)
> Sqrt(Col)
> Bin(Col)
> Hex(Col)
> Unhex(Col)
> Conv(Col, from_base, to_base)
> Abs(Col)
> Pmod(arg1, arg2)
> Sin(Col)
> Asin(Col)
> Cos(Col)
> ACos(Col)
> Atan(Col)
> Degrees(Col)
> Radians(Col)
> Positive(Col)
> Negative(Col)
> Sign(Col)
> E()
> Pi()
> To reduce the total code volume, do an implicit type cast from non-double
> input types to double.
> Also, POSITITVE and NEGATIVE are syntactic sugar for unary + and unary -, so
> reuse code for those as appropriate.
> Try to call the function directly in the inner loop and avoid new() or
> expensive operations, as appropriate.
> Templatize the code where appropriate, e.g. all the unary function of form
> DOUBLE func(DOUBLE)
> can probably be done with a template.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira