Re: Effects problems in logistic regression

DB Tsai Mon, 22 Dec 2014 06:40:01 -0800

Sounds great.


Sincerely,

DB Tsai
-------------------------------------------------------
Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Mon, Dec 22, 2014 at 5:27 AM, Franco Barrientos <
franco.barrien...@exalitica.com> wrote:

> Thanks again DB Tsai, LogisticRegressionWithLBFGS works for me!
>
>
>
> *De:* Franco Barrientos [mailto:franco.barrien...@exalitica.com]
> *Enviado el:* jueves, 18 de diciembre de 2014 16:42
> *Para:* 'DB Tsai'
> *CC:* 'Sean Owen'; user@spark.apache.org
> *Asunto:* RE: Effects problems in logistic regression
>
>
>
> Thanks I will try.
>
>
>
> *De:* DB Tsai [mailto:dbt...@dbtsai.com <dbt...@dbtsai.com>]
> *Enviado el:* jueves, 18 de diciembre de 2014 16:24
> *Para:* Franco Barrientos
> *CC:* Sean Owen; user@spark.apache.org
> *Asunto:* Re: Effects problems in logistic regression
>
>
>
> Can you try LogisticRegressionWithLBFGS? I verified that this will be
> converged to the same result trained by R's glmnet package without
> regularization. The problem of LogisticRegressionWithSGD is it's very
> slow in term of converging, and lots of time, it's very sensitive to
> stepsize which can lead to wrong answer.
>
>
>
> The regularization logic in MLLib is not entirely correct, and it will
> penalize the intercept. In general, with really high regularization, all
> the coefficients will be zeros except the intercept. In logistic
> regression, the non-zero intercept can be understood as the
> prior-probability of each class, and in linear regression, this will be the
> mean of response. I'll have a PR to fix this issue.
>
>
>
> Sincerely,
>
> DB Tsai
> -------------------------------------------------------
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>
>
> On Thu, Dec 18, 2014 at 10:50 AM, Franco Barrientos <
> franco.barrien...@exalitica.com> wrote:
>
> Yes, without the “amounts” variables the results are similiar. When I put
> other variables its fine.
>
>
>
> *De:* Sean Owen [mailto:so...@cloudera.com]
> *Enviado el:* jueves, 18 de diciembre de 2014 14:22
> *Para:* Franco Barrientos
> *CC:* user@spark.apache.org
> *Asunto:* Re: Effects problems in logistic regression
>
>
>
> Are you sure this is an apples-to-apples comparison? for example does your
> SAS process normalize or otherwise transform the data first?
>
>
>
> Is the optimization configured similarly in both cases -- same
> regularization, etc.?
>
>
>
> Are you sure you are pulling out the intercept correctly? It is a separate
> value from the logistic regression model in Spark.
>
>
>
> On Thu, Dec 18, 2014 at 4:34 PM, Franco Barrientos <
> franco.barrien...@exalitica.com> wrote:
>
> Hi all!,
>
>
>
> I have a problem with LogisticRegressionWithSGD, when I train a data set
> with one variable (wich is a amount of an item) and intercept, I get
> weights of
>
> (-0.4021,-207.1749) for both features, respectively. This don´t make sense
> to me because I run a logistic regression for the same data in SAS and I
> get these weights (-2.6604,0.000245).
>
>
>
> The rank of this variable is from 0 to 59102 with a mean of 1158.
>
>
>
> The problem is when I want to calculate the probabilities for each user
> from data set, this probability is near to zero or zero in much cases,
> because when spark calculates exp(-1*(-0.4021+(-207.1749)*amount)) this is
> a big number, in fact infinity for spark.
>
>
>
> How can I treat this variable? or why this happened?
>
>
>
> Thanks ,
>
>
>
> *Franco Barrientos*
> Data Scientist
>
> Málaga #115, Of. 1003, Las Condes.
> Santiago, Chile.
> (+562)-29699649
> (+569)-76347893
>
> franco.barrien...@exalitica.com
>
> www.exalitica.com
>
> [image: http://exalitica.com/web/img/frim.png]
>
>
>
>

Re: Effects problems in logistic regression

Reply via email to