[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

holdenk Sat, 16 Jan 2016 09:10:52 -0800

GitHub user holdenk opened a pull request:

    https://github.com/apache/spark/pull/10788


    [SPARK-7780][MLLIB] intercept in logisticregressionwith lbfgs should not be 
regularized

    The intercept in Logistic Regression represents a prior on categories which 
should not be regularized. In MLlib, the regularization is handled through 
Updater, and the Updater penalizes all the components without excluding the 
intercept which resulting poor training accuracy with regularization.
    The new implementation in ML framework handles this properly, and we should 
call the implementation in ML from MLlib since majority of users are still 
using MLlib api.
    Note that both of them are doing feature scalings to improve the 
convergence, and the only difference is ML version doesn't regularize the 
intercept. As a result, when lambda is zero, they will converge to the same 
solution.
    
    Previously partially reviewed at 
https://github.com/apache/spark/pull/6386#issuecomment-168781424 re-opening for 
@dbtsai to review.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/holdenk/spark 
SPARK-7780-intercept-in-logisticregressionwithLBFGS-should-not-be-regularized

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10788.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10788
    
----
commit a529c013fa722748cbd1d3878e4ea3bed5b15181
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-05-22T20:54:59Z

    document plans

commit f9e26350d15d7d36b75ece4f4718797dbe2a0830
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-05-22T22:53:29Z

    Some progress.

commit 7ebbd566e20923efc32dee1cfcf12ea315259e30
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-05-22T23:16:18Z

    Keep track of the number of requested classes so that if its more than 2 we 
use the legacy implementation. Also allow pass through of initialWeights

commit ef2a9b0f5b6cb2e971c2e5371f3394b4dec64574
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-05-22T23:48:06Z

    Expose a train on instances method within Spark, use numOfLinearPredictors 
instead of keeping track of class variable, pass through persistence information

commit 407491e38b1a5834d26a137ab20829a3d96f5142
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-05-24T01:14:04Z

    tests are fun

commit e02bf3a9688d1efa2f3da60b3d9f27911b04955d
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-05-24T07:42:13Z

    Start updating the tests to run with different updaters.

commit 8517539d0e8829833968dcb7e47ad8ba20849cb1
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-05-24T08:00:36Z

    get the tests compiling

commit a619d42b821575afd8efa90f2a38edf9690eb0df
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-05-24T08:04:53Z

    style fixed

commit 4febcc32f524edadeb68dc674e2681a087ffaa38
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-05-24T08:13:23Z

    make the test method private

commit e8e03a13ba04c6b3100e290a5c435959c2f01912
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-05-24T20:16:13Z

    CR feedback, pass RDD of Labeled points to ml implemetnation. Also from 
tests require that feature scaling is turned on to use ml implementation.

commit 38a024bd9a36e83ef8005a5f2af8a4dd44f6760e
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-05-25T07:24:21Z

    Convert it to a df and use set for the inital params

commit 478b8c5d5ff20478dc4ba913b0c77172e0abdfff
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-05-25T20:06:57Z

    Handle non-dense weights

commit 08589f58b81bc1e6099b425f86226053c5b6a360
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-05-26T03:39:54Z

    CR feedback: make the setInitialWeights function private, don't mess with 
the weights when they are user supploed, validate that the user supplied 
weights are reasonable.

commit f40c401496ae1e6cc7b39db820fea194d42c25c5
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-05-26T04:19:46Z

    style fix up

commit f35a16aa8110a33c32959db674908d145be6e97f
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-06-02T23:29:11Z

    Copy the number of iterations, convergence tolerance, and if we are fitting 
an intercept from mllib to ml when training lbfgs model using ml code

commit 4d431a358074f5245abcbc95af3e2bdf75b4f21d
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-06-03T00:39:48Z

    scala style check issue

commit 7e4192849efc6d282633159a15c7dd41376aa1a3
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-06-03T07:30:48Z

    Only the weights if we need to.

commit ed351ffdf862994389b41284f95aa148c6550f41
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-06-03T19:39:56Z

    Use appendBias for adding intercept to initial weights , fix 
generateInitialWeights

commit 3ac02d72cab72b35b7cc76c50d7088d4b98bfd9d
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-06-08T20:20:19Z

    Merge in master

commit d1ce12ba45f12d93b962ffd560242757eda739c2
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-07-09T20:13:21Z

    Merge in master

commit 8ca0fa927bd2773ceb4ccf740445058ead706f7a
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-08-28T21:57:51Z

    attempt to merge in master

commit 6f66f2cbc7d80335bfb0e2e5b8b430930206d06f
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-10-01T23:05:01Z

    Merge in master (again)

commit 0cedd50368eeda594eafdb9500ed162ff33f2e25
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-10-02T01:44:08Z

    Fix compile error after simple merge

commit 2bf289b2ab92ff9da742d22e1feda0b57f8a796c
Author: Holden Karau <hol...@us.ibm.com>
Date:   2015-12-30T18:41:30Z

    Merge branch 'master' into 
SPARK-7780-intercept-in-logisticregressionwithLBFGS-should-not-be-regularized

commit d7a26318be962eede7d6fa0792f1f4d72178dc8d
Author: Holden Karau <hol...@us.ibm.com>
Date:   2016-01-16T03:21:04Z

    Merge in master

commit b0fe1e68bf8e7fc13cc845db90e7eb27729545d9
Author: Holden Karau <hol...@us.ibm.com>
Date:   2016-01-16T03:24:08Z

    scala style import order fix

commit 827dcdec09414c5b25a66be359c4d651a9e18ee6
Author: Holden Karau <hol...@us.ibm.com>
Date:   2016-01-16T06:24:33Z

    Import ordering

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

Reply via email to