Re: [MLLib] Logistic Regression and standadization

2018-04-28 Thread Valeriy Avanesov
Hi Joseph, I've just tried that out. MLLib indeed returns different models. I see no problem here then. How can Filipp's issue be possible? Best, Valeriy. On 04/27/2018 10:00 PM, Valeriy Avanesov wrote: Hi all, maybe I'm missing something, but from what was discussed here I've gathered

Re: [MLLib] Logistic Regression and standadization

2018-04-28 Thread Joseph PENG
Hi Valeriy, Let me make sure we are on the same page. "the current mllib implementation returns exactly the same model whether standardization is turned on or off. " This should be corrected as "the current mllib implementation returns exactly the same model whether standardization is turned on

Re: [MLLib] Logistic Regression and standadization

2018-04-27 Thread Valeriy Avanesov
Hi all, maybe I'm missing something, but from what was discussed here I've gathered that the current mllib implementation returns exactly the same model whether standardization is turned on or off. I suggest to consider an R script (please, see below) which trains two penalized logistic

Re: [MLLib] Logistic Regression and standadization

2018-04-24 Thread DB Tsai
As I’m one of the original authors, let me chime in for some comments. Without the standardization, the LBFGS will be unstable. For example, if a feature is being x 10, then the corresponding coefficient should be / 10 to make the same prediction. But without standardization, the LBFGS will

Re: [MLLib] Logistic Regression and standadization

2018-04-20 Thread Weichen Xu
Right. If regularization item isn't zero, then enable/disable standardization will get different result. But, if comparing results between R-glmnet and mllib, if we set the same parameters for regularization/standardization/... , then we should get the same result. If not, then maybe there's a

Re: [MLLib] Logistic Regression and standadization

2018-04-20 Thread Valeriy Avanesov
Hi all. Filipp, do you use l1/l2/elstic-net penalization? I believe in this case standardization matters. Best, Valeriy. On 04/17/2018 11:40 AM, Weichen Xu wrote: Not a bug. When disabling standadization, mllib LR will still do standadization for features, but it will scale the

Re: [MLLib] Logistic Regression and standadization

2018-04-17 Thread Weichen Xu
Not a bug. When disabling standadization, mllib LR will still do standadization for features, but it will scale the coefficients back at the end (after training finished). So it will get the same result with no standadization training. The purpose of it is to improve the rate of convergence. So

Re: [MLLib] Logistic Regression and standadization

2018-04-13 Thread Yanbo Liang
Hi Filipp, MLlib’s LR implementation did the same way as R’s glmnet for standardization. Actually you don’t need to care about the implementation detail, as the coefficients are always returned on the original scale, so it should be return the same result as other popular ML libraries. Could

[MLLib] Logistic Regression and standadization

2018-04-08 Thread Filipp Zhinkin
Hi all, While migrating from custom LR implementation to MLLib's LR implementation my colleagues noticed that prediction quality dropped (accoring to different business metrics). It's turned out that this issue caused by features standardization perfomed by MLLib's LR: disregard to