Re: Spark LogisticRegression returns scaled coefficients
njoshi wrote > I am testing the LogisticRegression performance on a synthetically > generated data. Hmm, seems like a good idea. Can you give the code for generating the training data? best, Robert Dodier -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-LogisticRegression-returns-scaled-coefficients-tp25405p25421.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark LogisticRegression returns scaled coefficients
I am testing the LogisticRegression performance on a synthetically generated data. The weights I have as input are w = [2, 3, 4] with no intercept and three features. After training on 1000 synthetically generated datapoint assuming random normal distribution for each, the Spark LogisticRegression model I obtain has weights as [6.005520656096823,9.35980263762698,12.203400879214152] I can see that each weight is scaled by a factor close to '3' w.r.t. the original values. I am unable to guess the reason behind this. The code is simple enough as /* * Logistic Regression model */ val lr = new LogisticRegression() .setMaxIter(50) .setRegParam(0.001) .setElasticNetParam(0.95) .setFitIntercept(false) val lrModel = lr.fit(trainingData) println(s"${lrModel.weights}") I would greatly appreciate if someone could shed some light on what's fishy here. with kind regards, Nikhil -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-LogisticRegression-returns-scaled-coefficients-tp25405.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark LogisticRegression returns scaled coefficients
How do you compute the probability given the weights? Also, given a probability, you need to sample positive and negative based on the probability, and how do you do this? I'm pretty sure that the LoR will give you correct weights, and please see the generateMultinomialLogisticInput in https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Tue, Nov 17, 2015 at 4:11 PM, njoshi <nikhil.jo...@teamaol.com> wrote: > I am testing the LogisticRegression performance on a synthetically generated > data. The weights I have as input are > >w = [2, 3, 4] > > with no intercept and three features. After training on 1000 synthetically > generated datapoint assuming random normal distribution for each, the Spark > LogisticRegression model I obtain has weights as > > [6.005520656096823,9.35980263762698,12.203400879214152] > > I can see that each weight is scaled by a factor close to '3' w.r.t. the > original values. I am unable to guess the reason behind this. The code is > simple enough as > > > /* > * Logistic Regression model > */ > val lr = new LogisticRegression() > .setMaxIter(50) > .setRegParam(0.001) > .setElasticNetParam(0.95) > .setFitIntercept(false) > > val lrModel = lr.fit(trainingData) > > > println(s"${lrModel.weights}") > > > > I would greatly appreciate if someone could shed some light on what's fishy > here. > > with kind regards, Nikhil > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-LogisticRegression-returns-scaled-coefficients-tp25405.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark LogisticRegression returns scaled coefficients
Hi, Wonderful. I was sampling the output, but with a bug. Your comment brought the realization :). I was indeed victimized by the complete separability issue :). Thanks a lot. with regards, Nikhil On Tue, Nov 17, 2015 at 5:26 PM, DB Tsai <dbt...@dbtsai.com> wrote: > How do you compute the probability given the weights? Also, given a > probability, you need to sample positive and negative based on the > probability, and how do you do this? I'm pretty sure that the LoR will > give you correct weights, and please see the > generateMultinomialLogisticInput in > > https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala > > Sincerely, > > DB Tsai > -- > Web: https://www.dbtsai.com > PGP Key ID: 0xAF08DF8D > > > On Tue, Nov 17, 2015 at 4:11 PM, njoshi <nikhil.jo...@teamaol.com> wrote: > > I am testing the LogisticRegression performance on a synthetically > generated > > data. The weights I have as input are > > > >w = [2, 3, 4] > > > > with no intercept and three features. After training on 1000 > synthetically > > generated datapoint assuming random normal distribution for each, the > Spark > > LogisticRegression model I obtain has weights as > > > > [6.005520656096823,9.35980263762698,12.203400879214152] > > > > I can see that each weight is scaled by a factor close to '3' w.r.t. the > > original values. I am unable to guess the reason behind this. The code is > > simple enough as > > > > > > /* > > * Logistic Regression model > > */ > > val lr = new LogisticRegression() > > .setMaxIter(50) > > .setRegParam(0.001) > > .setElasticNetParam(0.95) > > .setFitIntercept(false) > > > > val lrModel = lr.fit(trainingData) > > > > > > println(s"${lrModel.weights}") > > > > > > > > I would greatly appreciate if someone could shed some light on what's > fishy > > here. > > > > with kind regards, Nikhil > > > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-LogisticRegression-returns-scaled-coefficients-tp25405.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > - > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > -- *Nikhil Joshi*Princ Data Scientist *Aol*PLATFORMS. *395 Page Mill Rd, *Palo Alto <http://www.mapquest.com/maps?city=Palo+Alto=CA>, CA <http://www.mapquest.com/maps?state=CA> 94306-2024 <http://www.mapquest.com/maps?zipcode=94306-2024>vvmr: 8894737