[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-11-12 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/22144 I think we should add the support back. It sounded like some people didn't like this PR for a fix so we would need to investigate something else, @cloud-fan did you have more specifics about

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-11-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 So, 2.4 is out. Where are we? Rereading the comments above, looks we should: 1. Find the root cause 2. Officially drop it if the workaround is not easy 3. Fix it if the

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22144 Maybe we should discuss it in another thread, I think we should officially write down the procedure about how to evaluate a bug report and label it as a blocker or not. Currently

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-24 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/22144 Thanks @tgravescs for your latest posts -- they've saved me from posting something similar in many respects but more strongly worded. What is bothering me (not just in the discussion of

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-24 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/22144 @cloud-fan I agree with you that most issues do not have to block a release but I don't agree with most of your criteria for making that decision. I've already talked about the other

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22144 @tgravescs please quote my full comment instead of part of it. > After all, this is a bug and a regression from previous releases, like other 1000 we've fixed before. The point I

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-24 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22144 @tgravescs yes, I didn't say it was normal or good, but that it's not forbidden. Ex: no more Java 7 support in Spark 2.3. The point was: if this change were on purpose, then no it's not a

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-24 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/22144 while I'm ok with not blocking 2.4 for this as well, not for many of the reasons stated though. Note the jira was filed a Major not a blocker. Based on the information we have, the impact on the

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 Adding it as a known issue sounds reasonable to me as well. --- - To unsubscribe, e-mail:

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22144 I think this is different from the blocker tickets we opened before. We should try our best to avoid accidentally dropping the existing support. Please encourage more people in the community to

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22144 Unfortunately, we didn't drop it mistakenly. It's a mistake and we should fix it. What I try to avoid is adding back the `supportsPartial` flag. We should look into the root cause and see how to

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 > According to the policy, we don't have to block the current release because of i @cloud-fan, BTW, would you mind if I ask to share what you read? I want to be aware of the policy as

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22144 After all, this is a bug and a regression from previous releases, like other 1000 we've fixed before. According to the policy, we don't have to block the current release because of it, and this

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/22144 Yes, @rxin I know that I was a little unfair to you in order to make my point sharper. Apologies. My concern is real, though. ---

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22144 @markhamstra how did you arrive at that conclusion? I said "it’s not a new regression and also somewhat esoteric" --- -

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/22144 @srowen I understand and agree. What bothers me is that the block-no block decision now often seems to be "not a regression; automatic no block" -- and that doesn't seem right to me. ---

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22144 @markhamstra I do think there's an unspoken but legitimate consideration here, and that's that there is also a cost to not shipping the N thousand other things users are waiting on in this release.

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/22144 > It’s certainly not a blocker since it’s not a new regression I really hate this line of argument. Somehow we seem to have slipped from "if it is a regression, then we must block"

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread AlexanderSaydakov
Github user AlexanderSaydakov commented on the issue: https://github.com/apache/spark/pull/22144 I believe there is a misunderstanding. It is not just about HLL Sketch UDFs. It seems to be a wrong way of executing Hive UDFs in general. It does not always manifest itself as a failure.

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22144 So, some particular functionality worked in Spark 2.1, but didn't work starting in 2.2. If anything, that was the breaking change. (I wonder if it was on purpose, or documented, but, what's done is

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/22144 The fact that it isn't a new regression shouldn't determine if its a blocker, there are lots of things you don't hit right away. The impact does affect if we decide if this is a

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 Wait wait .. do we only care about regressions as blockers for the last release (2.3)? I'm asking this because I really don't know. If so, I'm okay. ---

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22144 It’s certainly not a blocker since it’s not a new regression and also somewhat esoteric. Would be good to fix though. On Tue, Oct 23, 2018 at 8:20 AM Wenchen Fan wrote: >

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22144 This is not a PR that is ready to merge. We are likely talking about delaying 2.4.0 for multiple weeks because of this issue. Is it really worth? I'm not sure what's the exact policy,

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 For instance, https://groups.google.com/forum/?utm_medium=email_source=footer#!msg/sketches-user/GmH4-OlHP9g/MW-J7Hg4BwAJ this discussion thread was started almost one year ago. ---

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 It does try to fix a backward compatibility issue. It is found later now but still it is true we found a breaking issue to fix. Broken backward compatibility that potentially affects a set of

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22144 Note that the `supportsPartial` flag was dropped at Spark 2.2, not 2.4. I'm not very familiar with Hive code so I don't clearly know how it is broken. The worst case is, Hive has some

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/22144 @cloud-fan I disagree, it seems like you broke functionality that was there and you can't do that in a minor release. 3.0 would be fine to drop I think but we should fix it for 2.4, this PR

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 If we were going for 3.0, then I would definitely leave +1 and I agree that we should rather focus on Spark itself as a higher priority - we should do that when we go 3.0 and rather drop such

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22144 My feeling is that, hive compatibility is not that important to Spark at this point. *ALL* aggregate functions in Spark (including Spark UDAF) support partial aggregate, but now we need to

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-09-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 hmm .. this is actually a regression and compatibility issue we should take a look into .. --- - To unsubscribe, e-mail:

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-09-18 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/22144 @cloud-fan would you have time to look at this to at least see if this is something important? --- - To unsubscribe, e-mail:

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-09-08 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/22144 hey, this looks important, could someone review this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-09-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22144 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95763/ Test PASSed. ---

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-09-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22144 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-09-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22144 **[Test build #95763 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95763/testReport)** for PR 22144 at commit

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-09-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22144 **[Test build #95763 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95763/testReport)** for PR 22144 at commit

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-09-06 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/22144 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-09-05 Thread pgandhi999
Github user pgandhi999 commented on the issue: https://github.com/apache/spark/pull/22144 @cloud-fan I have updated the PR and added the configurable flags for Window UDAF and Hive UDAF for now. Would appreciate to hear your thoughts. Thank you. ---

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-08-23 Thread pgandhi999
Github user pgandhi999 commented on the issue: https://github.com/apache/spark/pull/22144 @cloud-fan I see a flag supportsPartial which was removed in PR https://github.com/apache/spark/pull/16461. Is this the flag you were talking about? If so, I was thinking that we make this flag

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-08-21 Thread pgandhi999
Github user pgandhi999 commented on the issue: https://github.com/apache/spark/pull/22144 @cloud-fan your comment makes sense, will work on it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-08-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22144 This should not be a global config but per-UDAF flag. IIRC we do have such a flag before, but get removed later. Maybe we should bring it back. ---

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-08-19 Thread pgandhi999
Github user pgandhi999 commented on the issue: https://github.com/apache/spark/pull/22144 @dilipbiswal This property is by default set to true so it does not effect anything currently in the way UDAF's run. This property has been added purely for the purpose of maintaining backward

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-08-19 Thread dilipbiswal
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/22144 @pgandhi999 I have a basic question. Setting this property will have a global effect on all the aggregations ? --- - To

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-08-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22144 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-08-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22144 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-08-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22144 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional