Naive Bayes parameters
1) How is the minPartitions parameter in NaiveBayes example used? What is the default value? 2) Why is the numFeatures specified as a parameter? Can this not be obtained from the data? This parameter is not specified for the other MLlib algorithms. thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Bayes-parameters-tp11592.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Naive Bayes parameters
Hi, Could you please send the link for the example you are talking about? minPartitions and numFeatures do not exist in the current API for NaiveBayes as far as I know. So, I don't know how to answer your second question. Regarding your first question, guessing blindly, it should be related to numPartitions, which is the number of partitions your dataset consists of. It is usually best to set this number to the number of cores your machine has. You can also try double the number of cores or half. Best, Burak - Original Message - From: "SK" To: u...@spark.incubator.apache.org Sent: Wednesday, August 6, 2014 3:45:09 PM Subject: Naive Bayes parameters 1) How is the minPartitions parameter in NaiveBayes example used? What is the default value? 2) Why is the numFeatures specified as a parameter? Can this not be obtained from the data? This parameter is not specified for the other MLlib algorithms. thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Bayes-parameters-tp11592.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Naive Bayes parameters
I followed the example in examples/src/main/scala/org/apache/spark/examples/mllib/SparseNaiveBayes.scala. IN this file Params is defined as follows: case class Params ( input: String = null, minPartitions: Int = 0, numFeatures: Int = -1, lambda: Double = 1.0) In the main function, the option parser accepts numFeatures as an option. But I looked at the code in more detail just now and found the following: val model = new NaiveBayes().setLambda(params.lambda).run(training) So looks like at the time of creation only the lambda parameter is used. Perhaps the example needs to be cleaned up during the next release. I am currently using Spark version 1.0.1. thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Bayes-parameters-tp11592p11623.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Naive Bayes parameters
It is used in data loading: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/SparseNaiveBayes.scala#L76 On Thu, Aug 7, 2014 at 12:47 AM, SK wrote: > I followed the example in > examples/src/main/scala/org/apache/spark/examples/mllib/SparseNaiveBayes.scala. > > IN this file Params is defined as follows: > > case class Params ( > input: String = null, > minPartitions: Int = 0, > numFeatures: Int = -1, > lambda: Double = 1.0) > > In the main function, the option parser accepts numFeatures as an option. > But I looked at the code in more detail just now and found the following: > > val model = new NaiveBayes().setLambda(params.lambda).run(training) > > So looks like at the time of creation only the lambda parameter is used. > Perhaps the example needs to be cleaned up during the next release. I am > currently using Spark version 1.0.1. > > > thanks > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Bayes-parameters-tp11592p11623.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Naive Bayes parameters
Ok, thanks for clarifying. So looks like numFeatures is only relevant for lib SVM format. I am using LabeledPoint, so if data is not sparse, perhaps numFeatures is not required. I thought that the Params class defines all the parameters passed to the ML algorithm. But it looks like it also includes other options. Just as a suggestion - it may be useful to have a separate class for just the algorithm parameters, so it is clear what can be tuned. thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Bayes-parameters-tp11592p11632.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org