In Spark 1.4, Logistic Regression with elasticNet is implemented in ML pipeline framework. Model selection can be achieved through high lambda resulting lots of zero in the coefficients.
Sincerely, DB Tsai ------------------------------------------------------- Blog: https://www.dbtsai.com On Fri, May 22, 2015 at 1:19 AM, SparknewUser <melanie.galloi...@gmail.com> wrote: > I am new in MLlib and in Spark.(I use Scala) > > I'm trying to understand how LogisticRegressionWithLBFGS and > LogisticRegressionWithSGD work. > I usually use R to do logistic regressions but now I do it on Spark > to be able to analyze Big Data. > > The model only returns weights and intercept. My problem is that I have no > information about which variable is significant and which variable I had > better > to delete to improve my model. I only have the confusion matrix and the AUC > to evaluate the performance. > > Is there any way to have information about the variables I put in my model? > How can I try different variable combinations, do I have to modify the > dataset > of origin (e.g. delete one or several columns?) > How are the weights calculated: is there a correlation calculation with the > variable > of interest? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-how-to-get-the-best-model-with-only-the-most-significant-explanatory-variables-in-LogisticRegr-tp22993.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org