[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

tgravescs Wed, 24 Oct 2018 06:58:23 -0700

Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/22144
  
    while I'm ok with not blocking 2.4 for this as well, not for many of the 
reasons stated though. Note the jira was filed a Major not a blocker. Based on 
the information we have, the impact on the number of users for this issue seems 
low, it doesn't seem to cause a correctness issue as it seems to fail when the 
issue is hit, the 2.4 release is far enough underway and has other things users 
are waiting on that we don't want to delay.  But I think we need to investigate 
more and make a decision what we are doing with it, if we find that this does 
have higher impact we can do a 2.4.1 and really we would want it in previous 
versions as well.
    
    I think the overall decision has to be based on the impact of the issue.  
As far as I know we don't have any written rules about this, but perhaps we 
need some.   The ultimate decision is basically if the release vote passes.  If 
the PMC members pass it they think its sufficient as a release.
    
    I do also agree with @markhamstra about our criteria for calling it a 
non-blocker.  We should not be making that decision based on if it was 
regression from only last previous release.
    
    I do NOT agree with @cloud-fan on most of his points as to why this is ok.
    
    "After all, this is a bug and a regression from previous releases, like 
other 1000 we've fixed before. "
    
    I'll state this again, this should have very little to do with the decision 
on if its a blocker, if its a correctness bug we are going to ignore because 
its been wrong for multiple release, the answer is NO we better not.  Many 
people don't upgrade immediately so things are found right away or its a 
obscure thing that only happens occasionally so it takes time for it to be 
reported.    I do agree that the time the issue has been around does go into 
the calculation of the impact though. 
    
     - is a hive compatibility bug. Spark fails to run some Hive UDAFs
    
    Really ?  what does a hive compatibility bug have to do with anything?  We 
state in our docs we support hive UDFs and UDAFs.  You seem very unconcerned 
with this which concerns me ("hive compatibility is not that important to Spark 
at this point") . Was there an official decision to drop this?  If so please 
point it out as I would strongly -1 this, otherwise anyone making changes here 
should keep compatibility and our committers are the ones that should enforce 
this and make sure it happens.  This is the basics of api compatibility. If we 
drop support for this many users will be hurt.  Just because your particular 
users don't use this, others do and as a member of Apache you should be 
concerned with the community not just your companies users.
    
    @cloud-fan  you are the one that removed the supportPartial flag here: 
https://issues.apache.org/jira/browse/SPARK-19060 so we were assuming you had 
some knowledge of the code in this area and might have the back ground on it.
    
    @srowen  your statement here: "Dropping support for something in a minor 
release isn't crazy though."  also concerns me.  We should not be dropping 
features on purpose in minor releases. This again is an api compatibility 
thing.  Unless its developer api or experimental. Obviously things get dropped 
by accident but we should not be doing this on purpose. Otherwise why do we 
have minor vs major releases at all.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

Reply via email to