Github user AlexanderSaydakov commented on the issue:

    https://github.com/apache/spark/pull/22144
  
    I believe there is a misunderstanding. It is not just about HLL Sketch 
UDFs. It seems to be a wrong way of executing Hive UDFs in general. It does not 
always manifest itself as a failure. A wrong state is being passed to some 
merge() calls. This violates the contract described in Hive's documentation.
    
    
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java
    
    A state is instantiated for PARTIAL1 mode, but then passed to merge() 
method, which should only happen in PARTIAL2 mode. This is just wrong. it just 
happens to work for many simple aggregations, which don't notice.
    
    HLL Sketch UDFs use different subclass of state for optimization, so 
merge() fails to cast the state the the class it expects. A workaround would be 
to give up the optimization and use the same state, but I am not sure we can 
trust the results.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to