[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-10-22 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 Thanks @WeichenXu123 for the comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-10-16 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 Please let me know if there's any unresolved comments. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-10-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-10-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82766/ Test PASSed. ---

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-10-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #82766 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82766/testReport)** for PR 17862 at commit

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-10-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #82766 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82766/testReport)** for PR 17862 at commit

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-10-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-10-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82377/ Test PASSed. ---

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-10-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #82377 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82377/testReport)** for PR 17862 at commit

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-10-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #82377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82377/testReport)** for PR 17862 at commit

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-10-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-10-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82376/ Test FAILed. ---

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-10-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #82376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82376/testReport)** for PR 17862 at commit

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-10-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #82376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82376/testReport)** for PR 17862 at commit

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-09-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-09-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81836/ Test PASSed. ---

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-09-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #81836 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81836/testReport)** for PR 17862 at commit

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-09-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #81836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81836/testReport)** for PR 17862 at commit

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-09-12 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17862 @hhbyyh Test result looks good! OWLQN takes longer time for each iteration, because each iteration's line search, it made more passes on dataset. ---

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-09-12 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 Tested with several larger data set with Hinge Loss function, to compare l-bfgs and owlqn solvers. Run until converged or exceed maxIter (2000). dataset | numRecords | numFeatures |

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81367/ Test FAILed. ---

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-09-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #81367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81367/testReport)** for PR 17862 at commit

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-09-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #81367 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81367/testReport)** for PR 17862 at commit

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-08-30 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 Sure, I can find some larger dataset to test with. But I guess, as showed in the PR description, LBFGS will generally outperform OWLQS, but not in all the cases. I assume single large scale

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-08-30 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17862 +1 for adding test on large-scale datasets. Another thing I want to know is that: you can compare the final loss value on the result coefficients, between LIBLINEAR(scikit-learn), LBFGS,

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-08-30 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/17862 +1 @jkbradley for test on large-scale datasets. @hhbyyh Do you have time to test it? If not, I can help. Thanks. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-08-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17862 Catching up here... To make sure I caught the decisions made in the discussion above, is it correct that this PR will: * Add support for squared hinge loss, and use that as the default (which

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-08-28 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 Given the discussion above, I plan to replace OWLQN with LBFGS. I will send update soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-08-17 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/17862 +1 @WeichenXu123 IIRC softmax regression also include a non-derivable point, we can use LBFGS to solve it as well. We can support _squared hinge loss_ which is smooth function in the future, so

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-08-16 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/17862 cc @WeichenXu123 What do you think about this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-07-06 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/17862 I'm in favor of discarding OWLQN. Take LiR or LoR as examples, if you replace LBFGS with OWLQN for regression with L2 regularization, we can saw OWLQN may converge faster than LBFGS in a certain

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-07-05 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 Yes, Both LBFGS and OWLQN generate similar model with sklearn if without intercept. About replacing OWLQN with LBFGS, I noticed if using hinge loss, sometimes OWLQN uses fewer iterations

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-07-01 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/17862 @hhbyyh Make sense, does it mean both LBFGS and OWLQN produce the same solution if fitting without intercept? If so, I'm prefer to change the solver to LBFGS rather than adding a new option.

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79002/ Test PASSed. ---

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #79002 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79002/testReport)** for PR 17862 at commit

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #79002 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79002/testReport)** for PR 17862 at commit

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-30 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 @yanboliang Without intercept, sklearn and Spark LinearSVC will get the same coefficients on several dataset I tested. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-29 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/17862 @hhbyyh If different handling of intercept scaling is the major cause for result difference between sklearn and Spark, do you check whether fit model without intercept will produce same model?

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-28 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 On many large dataset, LinearSVC cannot get the similar result with sklearn. e.g., SKLearn may get coefficients (5, 10, 15, 20), and spark LinearSVC will get (10, 20, 30, 40). It's different but in

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78731/ Test PASSed. ---

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #78731 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78731/testReport)** for PR 17862 at commit

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #78731 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78731/testReport)** for PR 17862 at commit

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78119/ Test PASSed. ---

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #78119 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78119/testReport)** for PR 17862 at commit

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #78119 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78119/testReport)** for PR 17862 at commit

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-14 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 Sure. That's reasonable. I'll move the hingeAggregator to a new PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78024/ Test PASSed. ---

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #78024 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78024/testReport)** for PR 17862 at commit

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-13 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/17862 @hhbyyh Thanks for doing the extra work to use the new aggregator here. I do think it's better to separate those changes from this one, though. There is actually more that needs to be done for the

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #78024 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78024/testReport)** for PR 17862 at commit