Re: MLlib Naive Bayes classifier confidence
That was it, Thanks. (Posting here so people know it's the right answer in case they have the same need :) ). sowen wrote Probabilities won't sum to 1 since this expression doesn't incorporate the probability of the evidence, I imagine? it's constant across classes so is usually excluded. It would appear as a - log(P(evidence)) term. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-Naive-Bayes-classifier-confidence-tp18456p20361.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: MLlib Naive Bayes classifier confidence
Probabilities won't sum to 1 since this expression doesn't incorporate the probability of the evidence, I imagine? it's constant across classes so is usually excluded. It would appear as a - log(P(evidence)) term. On Tue, Dec 2, 2014 at 10:44 AM, MariusFS marius.fete...@sien.com wrote: Are we sure that exponentiating will give us the probabilities? I did some tests by cloning the MLLIb class and adding the required code but the calculated probabilities do not add up to 1. I tried something like : def predictProbs(testData: Vector): (BDV[Double], BDV[Double]) = { val logProbs = brzPi + brzTheta * new BDV[Double](testData.toArray) val probs = logProbs.map(x = math.exp(x)) (logProbs, probs) } This was because I need the actual probs to process downstream from this... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-Naive-Bayes-classifier-confidence-tp18456p20175.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: MLlib Naive Bayes classifier confidence
Are we sure that exponentiating will give us the probabilities? I did some tests by cloning the MLLIb class and adding the required code but the calculated probabilities do not add up to 1. I tried something like : def predictProbs(testData: Vector): (BDV[Double], BDV[Double]) = { val logProbs = brzPi + brzTheta * new BDV[Double](testData.toArray) val probs = logProbs.map(x = math.exp(x)) (logProbs, probs) } This was because I need the actual probs to process downstream from this... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-Naive-Bayes-classifier-confidence-tp18456p20175.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: MLlib Naive Bayes classifier confidence
Not directly. If you could access brzPi and brzTheta in the NaiveBayesModel, you could repeat its same computation in predict() and exponentiate it to get back class probabilities, since input and internal values are in log space. Hm I wonder how people feel about exposing those fields or a different method to expose class probabilities? Seems useful since it is conceptually directly available. On Nov 10, 2014 5:46 AM, jatinpreet jatinpr...@gmail.com wrote: Hi, Is there a way to get the confidence value of a prediction with MLlib's implementation of Naive Baye's classification. I wish to eliminate the samples that were classified with low confidence. Thanks, Jatin - Novice Big Data Programmer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-Naive-Bayes-classifier-confidence-tp18456.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: MLlib Naive Bayes classifier confidence
Thanks for the answer. The variables brzPi and brzTheta are declared private. I am writing my code with Java otherwise I could have replicated the scala class and performed desired computation, which is as I observed a multiplication of brzTheta with test vector and adding this value to brzPi. Any suggestions of a way out other than replicating the whole functionality of Naive Baye's model in Java? That would be a time consuming process. - Novice Big Data Programmer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-Naive-Bayes-classifier-confidence-tp18456p18472.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: MLlib Naive Bayes classifier confidence
It's hacky, but you could access these fields via reflection. It'd be better to propose opening them up in a PR. On Mon, Nov 10, 2014 at 9:25 AM, jatinpreet jatinpr...@gmail.com wrote: Thanks for the answer. The variables brzPi and brzTheta are declared private. I am writing my code with Java otherwise I could have replicated the scala class and performed desired computation, which is as I observed a multiplication of brzTheta with test vector and adding this value to brzPi. Any suggestions of a way out other than replicating the whole functionality of Naive Baye's model in Java? That would be a time consuming process. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: MLlib Naive Bayes classifier confidence
Thanks, I will try it out and raise a request for making the variables accessible. An unrelated question, do you think the probability value thus calculated will be a good measure of confidence in prediction? I have been reading mixed opinions about the same. Jatin - Novice Big Data Programmer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-Naive-Bayes-classifier-confidence-tp18456p18497.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org