Re: Spark LogisticRegression returns scaled coefficients

2015-11-18 Thread robert_dodier
njoshi wrote
> I am testing the LogisticRegression performance on a synthetically
> generated data. 

Hmm, seems like a good idea. Can you give the code for generating the
training data?

best,

Robert Dodier



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-LogisticRegression-returns-scaled-coefficients-tp25405p25421.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark LogisticRegression returns scaled coefficients

2015-11-17 Thread njoshi
I am testing the LogisticRegression performance on a synthetically generated
data. The weights I have as input are

   w = [2, 3, 4]

with no intercept and three features. After training on 1000 synthetically
generated datapoint assuming random normal distribution for each, the Spark
LogisticRegression model I obtain has weights as

 [6.005520656096823,9.35980263762698,12.203400879214152]

I can see that each weight is scaled by a factor close to '3' w.r.t. the
original values. I am unable to guess the reason behind this. The code is
simple enough as


/*
 * Logistic Regression model
 */
val lr = new LogisticRegression()
  .setMaxIter(50)
  .setRegParam(0.001)
  .setElasticNetParam(0.95)
  .setFitIntercept(false)

val lrModel = lr.fit(trainingData)


println(s"${lrModel.weights}")



I would greatly appreciate if someone could shed some light on what's fishy
here.

with kind regards, Nikhil




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-LogisticRegression-returns-scaled-coefficients-tp25405.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark LogisticRegression returns scaled coefficients

2015-11-17 Thread DB Tsai
How do you compute the probability given the weights? Also, given a
probability, you need to sample positive and negative based on the
probability, and how do you do this? I'm pretty sure that the LoR will
give you correct weights, and please see the
generateMultinomialLogisticInput  in
https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala

Sincerely,

DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D


On Tue, Nov 17, 2015 at 4:11 PM, njoshi <nikhil.jo...@teamaol.com> wrote:
> I am testing the LogisticRegression performance on a synthetically generated
> data. The weights I have as input are
>
>w = [2, 3, 4]
>
> with no intercept and three features. After training on 1000 synthetically
> generated datapoint assuming random normal distribution for each, the Spark
> LogisticRegression model I obtain has weights as
>
>  [6.005520656096823,9.35980263762698,12.203400879214152]
>
> I can see that each weight is scaled by a factor close to '3' w.r.t. the
> original values. I am unable to guess the reason behind this. The code is
> simple enough as
>
>
> /*
>  * Logistic Regression model
>  */
> val lr = new LogisticRegression()
>   .setMaxIter(50)
>   .setRegParam(0.001)
>   .setElasticNetParam(0.95)
>   .setFitIntercept(false)
>
> val lrModel = lr.fit(trainingData)
>
>
> println(s"${lrModel.weights}")
>
>
>
> I would greatly appreciate if someone could shed some light on what's fishy
> here.
>
> with kind regards, Nikhil
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-LogisticRegression-returns-scaled-coefficients-tp25405.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark LogisticRegression returns scaled coefficients

2015-11-17 Thread Nikhil Joshi
Hi,

Wonderful. I was sampling the output, but with a bug. Your comment brought
the realization :). I was indeed victimized by the complete separability
issue :).

Thanks a lot.
with regards,
Nikhil

On Tue, Nov 17, 2015 at 5:26 PM, DB Tsai <dbt...@dbtsai.com> wrote:

> How do you compute the probability given the weights? Also, given a
> probability, you need to sample positive and negative based on the
> probability, and how do you do this? I'm pretty sure that the LoR will
> give you correct weights, and please see the
> generateMultinomialLogisticInput  in
>
> https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala
>
> Sincerely,
>
> DB Tsai
> --
> Web: https://www.dbtsai.com
> PGP Key ID: 0xAF08DF8D
>
>
> On Tue, Nov 17, 2015 at 4:11 PM, njoshi <nikhil.jo...@teamaol.com> wrote:
> > I am testing the LogisticRegression performance on a synthetically
> generated
> > data. The weights I have as input are
> >
> >w = [2, 3, 4]
> >
> > with no intercept and three features. After training on 1000
> synthetically
> > generated datapoint assuming random normal distribution for each, the
> Spark
> > LogisticRegression model I obtain has weights as
> >
> >  [6.005520656096823,9.35980263762698,12.203400879214152]
> >
> > I can see that each weight is scaled by a factor close to '3' w.r.t. the
> > original values. I am unable to guess the reason behind this. The code is
> > simple enough as
> >
> >
> > /*
> >  * Logistic Regression model
> >  */
> > val lr = new LogisticRegression()
> >   .setMaxIter(50)
> >   .setRegParam(0.001)
> >   .setElasticNetParam(0.95)
> >   .setFitIntercept(false)
> >
> > val lrModel = lr.fit(trainingData)
> >
> >
> > println(s"${lrModel.weights}")
> >
> >
> >
> > I would greatly appreciate if someone could shed some light on what's
> fishy
> > here.
> >
> > with kind regards, Nikhil
> >
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-LogisticRegression-returns-scaled-coefficients-tp25405.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>



-- 

*Nikhil Joshi*Princ Data Scientist
*Aol*PLATFORMS.
*395 Page Mill Rd, *Palo Alto
<http://www.mapquest.com/maps?city=Palo+Alto=CA>, CA
<http://www.mapquest.com/maps?state=CA> 94306-2024
<http://www.mapquest.com/maps?zipcode=94306-2024>vvmr: 8894737