Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
@smurching OK, I will close this PR and resubmit it to the new ticket.
---
-
To unsubscribe, e-mail:
Github user smurching commented on the issue:
https://github.com/apache/spark/pull/17014
Hi @zhengruifeng, thanks for your work on this!
Now that we're introducing a new handlePersistence parameter (a new public
API), it'd be good to track work in a separate JIRA/PR as
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81393/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #81393 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81393/testReport)**
for PR 17014 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #81393 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81393/testReport)**
for PR 17014 at commit
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
Jenkins, retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81376/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #81376 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81376/testReport)**
for PR 17014 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #81376 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81376/testReport)**
for PR 17014 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #81373 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81373/testReport)**
for PR 17014 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81373/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #81373 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81373/testReport)**
for PR 17014 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #81372 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81372/testReport)**
for PR 17014 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81372/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #81372 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81372/testReport)**
for PR 17014 at commit
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17014
@zhengruifeng `KMeans` regarded as a bugfix(SPARK-21799) because the
double-cache issue is introduced in 2.2 and cause perf regression.
Other algos also have the same issue, but the issue
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
@WeichenXu123 @jkbradley I am curious about why `ml.Kmeans` is special
that it needs a separate PR
---
If your project is set up for it, you can reply to this email and have your
reply
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17014
@zhengruifeng @jkbradley I create a PR #19107 for quick fix `KMeans` perf
regression bug.
This PR can continue to work on adding Param of `handlePersistence` which
is not so emergent.
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17014
Thanks all for discussing this! I'm just catching up now.
I'm OK with adding handlePersistence as a new Param, but please do so in a
separate PR and JIRA issue. I'd like to backport the
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
@WeichenXu123 Sounds good. And since adding `handlePersistence` as a
`ml.Param` may influences many algs (more than that in this PR), I think we may
need more discussion @MLnick @yanboliang
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17014
@smurching Yes this should be added as a `ml.Param`, we should not add as
an argument.
@zhengruifeng Would you mind update the PR according to our discussion
result above ?
Make
Github user smurching commented on the issue:
https://github.com/apache/spark/pull/17014
@WeichenXu123 That approach sounds reasonable to me.
My main thought (& this might be obvious) is on the implementation level --
as long as we implement this by adding an
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17014
I think about this double-cache issue for a few days. One big problem is
that, we are hard get precise storage level info. For example, we may add `map`
transform on cached dataset and then
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17014
@zhengruifeng OK. so the the part of `KMeans` in this PR still works. No
need change I think.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
@WeichenXu123 Current impl of `mllib.KMeans` seems do not support caching,
it just (log
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
@WeichenXu123 Agree that we should pass `handlePersistence` to mllib impl.
Thanks for pointing it out!
---
If your project is set up for it, you can reply to this email and have your
reply
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17014
cc @zhengruifeng
I update my comment you need check again, thanks!
I read the PR again, it still do not resolve double-caching issue in KMeans.
in KMean, your code
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
@WeichenXu123 @yanboliang I have updated this PR according to the comments.
Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81211/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #81211 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81211/testReport)**
for PR 17014 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #81211 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81211/testReport)**
for PR 17014 at commit
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
ping @MLnick ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so,
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79263/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #79263 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79263/testReport)**
for PR 17014 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #79263 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79263/testReport)**
for PR 17014 at commit
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
ping @MLnick Can you help reviewing this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76663/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #76663 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76663/testReport)**
for PR 17014 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #76663 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76663/testReport)**
for PR 17014 at commit
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
ping @MLnick Can you help reviewing this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #75414 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75414/testReport)**
for PR 17014 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75414/
Test PASSed.
---
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
@hhbyyh Thanks for you comments! And sorry for this late reply.
I update this PR: 1,limit the scope, only modifiy algorithms in which
double-caching already exist
2, add a function
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #75414 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75414/testReport)**
for PR 17014 at commit
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17014
I'm trying to refresh my memory and clear the targets on the topic,
basically we want to achieve:
1. Avoid double caching. If Input Dataset is already cached, then we should
not cache the
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
ping @hhbyyh
I updated the PR, can you please help reviewing this? Thank in advance.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74935/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #74935 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74935/testReport)**
for PR 17014 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #74935 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74935/testReport)**
for PR 17014 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74932/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #74932 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74932/testReport)**
for PR 17014 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #74932 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74932/testReport)**
for PR 17014 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74925/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #74925 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74925/testReport)**
for PR 17014 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74921/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17014
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #74921 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74921/testReport)**
for PR 17014 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #74925 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74925/testReport)**
for PR 17014 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17014
**[Test build #74921 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74921/testReport)**
for PR 17014 at commit
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
@hhbyyh I think I misunderstood your comments in jira. I will update this
pr with the new plan: directly add `protected var storageLevel` in `Predictor`,
without adding setter and getter of it
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17014
Hi @zhengruifeng , is there any update?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
ping @hhbyyh ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
74 matches
Mail list logo