Github user AlexanderSaydakov commented on the issue: https://github.com/apache/spark/pull/22144 I believe there is a misunderstanding. It is not just about HLL Sketch UDFs. It seems to be a wrong way of executing Hive UDFs in general. It does not always manifest itself as a failure. A wrong state is being passed to some merge() calls. This violates the contract described in Hive's documentation. https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java A state is instantiated for PARTIAL1 mode, but then passed to merge() method, which should only happen in PARTIAL2 mode. This is just wrong. it just happens to work for many simple aggregations, which don't notice. HLL Sketch UDFs use different subclass of state for optimization, so merge() fails to cast the state the the class it expects. A workaround would be to give up the optimization and use the same state, but I am not sure we can trust the results.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org