Sounds great.
Sincerely, DB Tsai ------------------------------------------------------- Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Mon, Dec 22, 2014 at 5:27 AM, Franco Barrientos < franco.barrien...@exalitica.com> wrote: > Thanks again DB Tsai, LogisticRegressionWithLBFGS works for me! > > > > *De:* Franco Barrientos [mailto:franco.barrien...@exalitica.com] > *Enviado el:* jueves, 18 de diciembre de 2014 16:42 > *Para:* 'DB Tsai' > *CC:* 'Sean Owen'; user@spark.apache.org > *Asunto:* RE: Effects problems in logistic regression > > > > Thanks I will try. > > > > *De:* DB Tsai [mailto:dbt...@dbtsai.com <dbt...@dbtsai.com>] > *Enviado el:* jueves, 18 de diciembre de 2014 16:24 > *Para:* Franco Barrientos > *CC:* Sean Owen; user@spark.apache.org > *Asunto:* Re: Effects problems in logistic regression > > > > Can you try LogisticRegressionWithLBFGS? I verified that this will be > converged to the same result trained by R's glmnet package without > regularization. The problem of LogisticRegressionWithSGD is it's very > slow in term of converging, and lots of time, it's very sensitive to > stepsize which can lead to wrong answer. > > > > The regularization logic in MLLib is not entirely correct, and it will > penalize the intercept. In general, with really high regularization, all > the coefficients will be zeros except the intercept. In logistic > regression, the non-zero intercept can be understood as the > prior-probability of each class, and in linear regression, this will be the > mean of response. I'll have a PR to fix this issue. > > > > Sincerely, > > DB Tsai > ------------------------------------------------------- > My Blog: https://www.dbtsai.com > LinkedIn: https://www.linkedin.com/in/dbtsai > > > > On Thu, Dec 18, 2014 at 10:50 AM, Franco Barrientos < > franco.barrien...@exalitica.com> wrote: > > Yes, without the “amounts” variables the results are similiar. When I put > other variables its fine. > > > > *De:* Sean Owen [mailto:so...@cloudera.com] > *Enviado el:* jueves, 18 de diciembre de 2014 14:22 > *Para:* Franco Barrientos > *CC:* user@spark.apache.org > *Asunto:* Re: Effects problems in logistic regression > > > > Are you sure this is an apples-to-apples comparison? for example does your > SAS process normalize or otherwise transform the data first? > > > > Is the optimization configured similarly in both cases -- same > regularization, etc.? > > > > Are you sure you are pulling out the intercept correctly? It is a separate > value from the logistic regression model in Spark. > > > > On Thu, Dec 18, 2014 at 4:34 PM, Franco Barrientos < > franco.barrien...@exalitica.com> wrote: > > Hi all!, > > > > I have a problem with LogisticRegressionWithSGD, when I train a data set > with one variable (wich is a amount of an item) and intercept, I get > weights of > > (-0.4021,-207.1749) for both features, respectively. This don´t make sense > to me because I run a logistic regression for the same data in SAS and I > get these weights (-2.6604,0.000245). > > > > The rank of this variable is from 0 to 59102 with a mean of 1158. > > > > The problem is when I want to calculate the probabilities for each user > from data set, this probability is near to zero or zero in much cases, > because when spark calculates exp(-1*(-0.4021+(-207.1749)*amount)) this is > a big number, in fact infinity for spark. > > > > How can I treat this variable? or why this happened? > > > > Thanks , > > > > *Franco Barrientos* > Data Scientist > > Málaga #115, Of. 1003, Las Condes. > Santiago, Chile. > (+562)-29699649 > (+569)-76347893 > > franco.barrien...@exalitica.com > > www.exalitica.com > > [image: http://exalitica.com/web/img/frim.png] > > > >