Re: Regularization parameters
Hi, I am following the code in examples/src/main/scala/org/apache/spark/examples/mllib/BinaryClassification.scala For setting the parameters and parsing the command line options, I am just reusing that code.Params is defined as follows. case class Params( input: String = null, numIterations: Int = 100, stepSize: Double = 1.0, algorithm: Algorithm = LR, regType: RegType = L2, regParam: Double = 0.1) I use the command line option --regType to choose L1 or L2, and --regParam to set it to 0.0. The option parser code in the example above parses the options and creates the LogisticRegression object. It calls setRegParam(regParam) to set the regularization parameter and calls the updater to set the regType. To run LR, I am again using the code in the example above (algorithm.run(training).clearThreshold()) The code in the above example computes AUC. To compute accuracy of the test data classification, I map the class to 0 if prediction 0.5, else it is mapped to class 1. THen I compare the predictions with the corresponding labels and the number of matches is given by correctCount. val accuracy = correctCount.toDouble / predictionAndLabel.count thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Regularization-parameters-tp11601p11627.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Regularization parameters
Then this may be a bug. Do you mind sharing the dataset that we can use to reproduce the problem? -Xiangrui On Thu, Aug 7, 2014 at 1:20 AM, SK skrishna...@gmail.com wrote: Spark 1.0.1 thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Regularization-parameters-tp11601p11631.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Regularization parameters
What is the definition of regParam and what is the range of values it is allowed to take? thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Regularization-parameters-tp11601p11737.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Regularization parameters
Hi, That is interesting. Would you please share some code on how you are setting the regularization type, regularization parameters and running Logistic Regression? Thanks, Burak - Original Message - From: SK skrishna...@gmail.com To: u...@spark.incubator.apache.org Sent: Wednesday, August 6, 2014 6:18:43 PM Subject: Regularization parameters Hi, I tried different regularization parameter values with Logistic Regression for binary classification of my dataset and would like to understand the following results: regType = L2, regParam = 0.0 , I am getting AUC = 0.80 and accuracy of 80% regType = L1, regParam = 0.0 , I am getting AUC = 0.80 and accuracy of 50% To calculate accuracy I am using 0.5 as threshold. prediction 0.5 is class 0, and prediction = 0.5 is class 1. regParam = 0.0, implies I am not using any regularization, is that correct? If so, it should not matter whether I specify L1 or L2, I should get the same results. So why is the accuracy value different? thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Regularization-parameters-tp11601.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Regularization parameters
One possible straightforward explanation might be your solution(s) might be stuck in local minima?? And depending on your weights initialization, you are getting different parameters? Maybe have same initial weights for both the runs... or I would probably test the execution with synthetic dataset with global solutions..? On Wed, Aug 6, 2014 at 7:12 PM, Burak Yavuz bya...@stanford.edu wrote: Hi, That is interesting. Would you please share some code on how you are setting the regularization type, regularization parameters and running Logistic Regression? Thanks, Burak - Original Message - From: SK skrishna...@gmail.com To: u...@spark.incubator.apache.org Sent: Wednesday, August 6, 2014 6:18:43 PM Subject: Regularization parameters Hi, I tried different regularization parameter values with Logistic Regression for binary classification of my dataset and would like to understand the following results: regType = L2, regParam = 0.0 , I am getting AUC = 0.80 and accuracy of 80% regType = L1, regParam = 0.0 , I am getting AUC = 0.80 and accuracy of 50% To calculate accuracy I am using 0.5 as threshold. prediction 0.5 is class 0, and prediction = 0.5 is class 1. regParam = 0.0, implies I am not using any regularization, is that correct? If so, it should not matter whether I specify L1 or L2, I should get the same results. So why is the accuracy value different? thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Regularization-parameters-tp11601.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Mohit When you want success as badly as you want the air, then you will get it. There is no other secret of success. -Socrates