Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17862
Thanks @WeichenXu123 for the comments.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17862
Please let me know if there's any unresolved comments. Thanks.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17862
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17862
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82766/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17862
**[Test build #82766 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82766/testReport)**
for PR 17862 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17862
**[Test build #82766 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82766/testReport)**
for PR 17862 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17862
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17862
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82377/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17862
**[Test build #82377 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82377/testReport)**
for PR 17862 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17862
**[Test build #82377 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82377/testReport)**
for PR 17862 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17862
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17862
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82376/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17862
**[Test build #82376 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82376/testReport)**
for PR 17862 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17862
**[Test build #82376 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82376/testReport)**
for PR 17862 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17862
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17862
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81836/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17862
**[Test build #81836 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81836/testReport)**
for PR 17862 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17862
**[Test build #81836 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81836/testReport)**
for PR 17862 at commit
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17862
@hhbyyh Test result looks good!
OWLQN takes longer time for each iteration, because each iteration's line
search, it made more passes on dataset.
---
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17862
Tested with several larger data set with Hinge Loss function, to compare
l-bfgs and owlqn solvers.
Run until converged or exceed maxIter (2000).
dataset | numRecords | numFeatures |
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17862
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17862
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81367/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17862
**[Test build #81367 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81367/testReport)**
for PR 17862 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17862
**[Test build #81367 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81367/testReport)**
for PR 17862 at commit
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17862
Sure, I can find some larger dataset to test with.
But I guess, as showed in the PR description, LBFGS will generally
outperform OWLQS, but not in all the cases. I assume single large scale
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17862
+1 for adding test on large-scale datasets.
Another thing I want to know is that: you can compare the final loss value
on the result coefficients, between LIBLINEAR(scikit-learn), LBFGS,
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/17862
+1 @jkbradley for test on large-scale datasets. @hhbyyh Do you have time to
test it? If not, I can help. Thanks.
---
If your project is set up for it, you can reply to this email and have your
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17862
Catching up here... To make sure I caught the decisions made in the
discussion above, is it correct that this PR will:
* Add support for squared hinge loss, and use that as the default (which
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17862
Given the discussion above, I plan to replace OWLQN with LBFGS. I will send
update soon.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/17862
+1 @WeichenXu123 IIRC softmax regression also include a non-derivable
point, we can use LBFGS to solve it as well. We can support _squared hinge
loss_ which is smooth function in the future, so
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/17862
cc @WeichenXu123 What do you think about this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/17862
I'm in favor of discarding OWLQN. Take LiR or LoR as examples, if you
replace LBFGS with OWLQN for regression with L2 regularization, we can saw
OWLQN may converge faster than LBFGS in a certain
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17862
Yes, Both LBFGS and OWLQN generate similar model with sklearn if without
intercept.
About replacing OWLQN with LBFGS, I noticed if using hinge loss, sometimes
OWLQN uses fewer iterations
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/17862
@hhbyyh Make sense, does it mean both LBFGS and OWLQN produce the same
solution if fitting without intercept? If so, I'm prefer to change the solver
to LBFGS rather than adding a new option.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17862
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79002/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17862
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17862
**[Test build #79002 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79002/testReport)**
for PR 17862 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17862
**[Test build #79002 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79002/testReport)**
for PR 17862 at commit
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17862
@yanboliang Without intercept, sklearn and Spark LinearSVC will get the
same coefficients on several dataset I tested.
---
If your project is set up for it, you can reply to this email and have
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/17862
@hhbyyh If different handling of intercept scaling is the major cause for
result difference between sklearn and Spark, do you check whether fit model
without intercept will produce same model?
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17862
On many large dataset, LinearSVC cannot get the similar result with
sklearn. e.g., SKLearn may get coefficients (5, 10, 15, 20), and spark
LinearSVC will get (10, 20, 30, 40). It's different but in
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17862
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78731/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17862
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17862
**[Test build #78731 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78731/testReport)**
for PR 17862 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17862
**[Test build #78731 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78731/testReport)**
for PR 17862 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17862
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17862
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78119/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17862
**[Test build #78119 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78119/testReport)**
for PR 17862 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17862
**[Test build #78119 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78119/testReport)**
for PR 17862 at commit
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17862
Sure. That's reasonable. I'll move the hingeAggregator to a new PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17862
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78024/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17862
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17862
**[Test build #78024 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78024/testReport)**
for PR 17862 at commit
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17862
@hhbyyh Thanks for doing the extra work to use the new aggregator here. I
do think it's better to separate those changes from this one, though. There is
actually more that needs to be done for the
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17862
**[Test build #78024 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78024/testReport)**
for PR 17862 at commit
55 matches
Mail list logo