[ 
https://issues.apache.org/jira/browse/HIVE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13718932#comment-13718932
 ] 

Jitendra Nath Pandey commented on HIVE-4822:
--------------------------------------------

bq. How does explain work with the vectorization engine?
  The 'explain' continues to work as before and returns the same plan as in 
non-vector mode.
Vectorization executes exactly the same query plan, only the implementation of 
the operators and expressions has been changed to run in vectorized fashion.
  However, we do plan to enhance 'explain' to also show which operators will be 
executed in vectorized mode. We will start working on it very soon and file a 
jira.

  In current implementation, we don't need the 'explain' annotations on 
vectorized UDFs, because the vectorized UDFs are used at run time. In the query 
planning stage only row mode UDFs are used, however at query execution time if 
vectorization is possible, we switch to corresponding vectorized UDFs. We 
adopted this approach to avoid any changes to query planner for vectorization.

bq. Could we somehow hybrid some of our existing UDFS to work from both engines?
  We will surely have to support the hybrid approach, as you are suggesting, 
for UDFs that users have implemented, even though we will recommend users to 
re-implement their UDFs in vectorized fashion. However, for built in hive UDFs, 
it will almost always be better to have vectorized implementation for 
performance. Eventually, we do want to have vectorized implementation for all 
built-in UDFs.

bq. Are we sure that functions that operate on doubles and floats are going to 
round exactly the same way? 
  We have used same underlying java libraries therefore, our results should 
match. In our testing we do compare the results with non-vector results to make 
sure.

bq. Do we have a wiki page or something where we are keeping track of what is 
currently supported using vectorization?
  That's a good idea, I agree we should track this so that community is aware. 
It will also help and encourage folks to identify areas to contribute.

                
> implement vectorized math functions
> -----------------------------------
>
>                 Key: HIVE-4822
>                 URL: https://issues.apache.org/jira/browse/HIVE-4822
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: vectorization-branch
>            Reporter: Eric Hanson
>            Assignee: Eric Hanson
>             Fix For: vectorization-branch
>
>         Attachments: HIVE-4822.1.patch, HIVE-4822.4.patch, 
> HIVE-4822.5-vectorization.patch
>
>
> Implement vectorized support for the all the built-in math functions. This 
> includes implementing the vectorized operation, and tying it all together in 
> VectorizationContext so it runs end-to-end. These functions include:
> round(Col)
> Round(Col, N)
> Floor(Col)
> Ceil(Col)
> Rand(), Rand(seed)
> Exp(Col)
> Ln(Col)
> Log10(Col)
> Log2(Col)
> Log(base, Col)
> Pow(col, p), Power(col, p)
> Sqrt(Col)
> Bin(Col)
> Hex(Col)
> Unhex(Col)
> Conv(Col, from_base, to_base)
> Abs(Col)
> Pmod(arg1, arg2)
> Sin(Col)
> Asin(Col)
> Cos(Col)
> ACos(Col)
> Atan(Col)
> Degrees(Col)
> Radians(Col)
> Positive(Col)
> Negative(Col)
> Sign(Col)
> E()
> Pi()
> To reduce the total code volume, do an implicit type cast from non-double 
> input types to double. 
> Also, POSITITVE and NEGATIVE are syntactic sugar for unary + and unary -, so 
> reuse code for those as appropriate.
> Try to call the function directly in the inner loop and avoid new() or 
> expensive operations, as appropriate.
> Templatize the code where appropriate, e.g. all the unary function of form 
> DOUBLE func(DOUBLE)
> can probably be done with a template.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to