[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

cloud-fan Tue, 23 Oct 2018 06:59:48 -0700

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22144
  
    Note that the `supportsPartial` flag was dropped at Spark 2.2, not 2.4.
    
    I'm not very familiar with Hive code so I don't clearly know how it is 
broken. The worst case is, Hive has some UDAF that don't support partial 
aggregate, and Spark needs to adjust its aggregate framework. Or it's just we 
incorrectly adapt Hive UDAF to Spark aggregation function, and we can simply 
work around it.
    
    I shouldn't state is as a feature, it's an ability of Spark's aggregate 
framework to stop partial aggregate for some functions.
    
    This fix is not ready. We should at least update the doc of 
`HiveUDAFFunction`, so that we can know where we misunderstand Hive UDAF 
framework.
    
    If we were at Spark 2.2, we should definitely revert the PRs that caused 
this issue. But it's 2.4 now, reverting very old commits is not safe.
    
    Personally I don't think this is a blocker that we have to fix it before 
releasing. It's not a correctness issue. it doesn't impact a lot of users, and 
it's there for nearly 2 years.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

Reply via email to