Re: [MLlib] LogisticRegressionWithSGD and LogisticRegressionWithLBFGS converge with different weights.

2014-10-09 Thread DB Tsai
Nice to hear that your experiment is consistent to my assumption. The current L1/L2 will penalize the intercept as well which is not idea. I'm working on GLMNET in Spark using OWLQN, and I can exactly get the same solution as R but with scalability in # of rows and columns. Stay tuned! Sincerely,

Re: [MLlib] LogisticRegressionWithSGD and LogisticRegressionWithLBFGS converge with different weights.

2014-09-29 Thread Xiangrui Meng
The test accuracy doesn't mean the total loss. All points between (-1, 1) can separate points -1 and +1 and give you 1.0 accuracy, but their coressponding loss are different. -Xiangrui On Sun, Sep 28, 2014 at 2:48 AM, Yanbo Liang yanboha...@gmail.com wrote: Hi We have used LogisticRegression

Re: [MLlib] LogisticRegressionWithSGD and LogisticRegressionWithLBFGS converge with different weights.

2014-09-29 Thread DB Tsai
Can you check the loss of both LBFGS and SGD implementation? One reason maybe SGD doesn't converge well and you can see that by comparing both log-likelihoods. One other potential reason maybe the label of your training data is totally separable, so you can always increase the log-likelihood by

Re: [MLlib] LogisticRegressionWithSGD and LogisticRegressionWithLBFGS converge with different weights.

2014-09-29 Thread Yanbo Liang
Thank you for all your patient response. I can conclude that if the data is totally separable or over-fit occurs, weights may be different. And it also consistent with my experiment. I have evaluate two different dataset and the result as followed: Loss function: LogisticGradient Regularizer: L2

[MLlib] LogisticRegressionWithSGD and LogisticRegressionWithLBFGS converge with different weights.

2014-09-28 Thread Yanbo Liang
Hi We have used LogisticRegression with two different optimization method SGD and LBFGS in MLlib. With the same dataset and the same training and test split, but get different weights vector. For example, we use spark-1.1.0/data/mllib/sample_binary_classification_data.txt as our training and