I have been trying the Naive Baye's implementation of Spark's MLlib.During
testing phase, I wish to eliminate data with low confidence of prediction.

My data set primarily consists of form based documents like reports and
application forms. They contain key-value pair type text and hence I assume
the independence condition holds better than with natural language.

About the quality of priors, I am not doing anything special. I am training
more or less equal number of samples for each class and have left the heavy
lifting to be done by MLlib.

Given these facts, does it make sense to have confidence thresholds defined
for each category above which I will get correct results consistently?

Thanks
Jatin



-----
Novice Big Data Programmer
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Baye-s-classification-confidence-tp19341.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to