Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19406
@HyukjinKwon thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19406
Ah, that's fine :). It was just an option. I will follow discussion and
help sort it out in any event.
---
-
To
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19406
@HyukjinKwon These two JIRAs change percentile_approx in different ways, so
maybe it's better to use different JIRAs?
---
-
To
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19406
@HyukjinKwon uh...just saw this, already created a new [JIRA](url) and
[PR](https://github.com/apache/spark/pull/19438), is it also ok?
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19406
Oh, optionally, we can just edit the JIRA I guess.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19406
@srowen @jiangxb1987 OK, I'll close this JIRA and creating a new JIRA as
improvement instead of bugfix.
---
-
To unsubscribe,
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/19406
This is actually a bugfix instead of improvement, I think we should follow
the approach that @srowen have suggested.
---
-
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19406
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19406
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82419/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19406
**[Test build #82419 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82419/testReport)**
for PR 19406 at commit
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19406
cc @WeichenXu123
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19406
Your proposed way is more consistent with the paper and I also think it's
more natural to start from 0.
The purpose of this patch is to not introduce accuracy regression for other
cases, and
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/19406
I think we should probably follow the paper, no? this should fix more
cases. Yes, this case also failed for me. The answer 499 is OK too.
---
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19406
@srowen I tried your way on my laptop, it works fine. But an existing test
case failed: (500 expected , 499 returned)
```
test("percentile_approx, supports constant folding for parameter
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/19406
Your new test also passes when changing ...
```
val targetError = relativeError * count
...
var i = 0
```
I think this is more likely to be the correct fix as
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19406
**[Test build #82419 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82419/testReport)**
for PR 19406 at commit
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/19406
I read the original paper at
http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf and
think there might be two bugs in the implementation here that might actually
cause the
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19406
@srowen Before the change, the answer in the test case is `2, 2, 2`.
Based on the code before the change, percentile_approx would never return
the first element when percentile is in
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19406
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82384/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19406
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19406
**[Test build #82384 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82384/testReport)**
for PR 19406 at commit
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/19406
(What is the answer in your test case before the change?)
Yea, I guess I am not sure why this method has to use relativeError at all,
but I didn't think about it much. If it didn't I think it
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19406
Yes it's ok if consider it's approximate with relativeError. But the point
of this pr is that there could be a further improvement in those special cases.
IMHO it's more accurate and intuitive for
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19406
**[Test build #82384 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82384/testReport)**
for PR 19406 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19406
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82379/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19406
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19406
**[Test build #82379 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82379/testReport)**
for PR 19406 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19406
**[Test build #82379 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82379/testReport)**
for PR 19406 at commit
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/19406
It's probably OK, though kind of special cases the first quantile. This
expression doesn't incorporate relativeError. I'm actually not sure why it
needs to account for relativeError here; it's an
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19406
@srowen For example, given input data 1 to 10, if a user queries 10% (or
even less) percentile, it should return 1, because the first value 1 already
reaches 10% percentage. Without this fix, it
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/19406
The JIRA doesn't explain what this is meant to fix. What case does this
help get more correct?
---
-
To unsubscribe, e-mail:
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19406
**[Test build #82368 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82368/testReport)**
for PR 19406 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19406
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82368/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19406
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19406
**[Test build #82368 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82368/testReport)**
for PR 19406 at commit
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19406
cc @thunterdb @cloud-fan @gatorsmile
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,
36 matches
Mail list logo