Naive Bayes parameters

2014-08-06 Thread SK

1) How is the minPartitions parameter in NaiveBayes example used? What is
the default value?

2) Why is the  numFeatures specified as a parameter? Can this not be
obtained from the data? This parameter is not specified for the other MLlib
algorithms.  

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Bayes-parameters-tp11592.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Naive Bayes parameters

2014-08-06 Thread Burak Yavuz
Hi,

Could you please send the link for the example you are talking about? 
minPartitions and numFeatures do not exist in the current API 
for NaiveBayes as far as I know. So, I don't know how to answer your second 
question.

Regarding your first question, guessing blindly, it should be related to 
numPartitions, which is the number of partitions your dataset consists of.
It is usually best to set this number to the number of cores your machine has. 
You can also try double the number of cores or half.

Best,
Burak

- Original Message -
From: "SK" 
To: u...@spark.incubator.apache.org
Sent: Wednesday, August 6, 2014 3:45:09 PM
Subject: Naive Bayes parameters


1) How is the minPartitions parameter in NaiveBayes example used? What is
the default value?

2) Why is the  numFeatures specified as a parameter? Can this not be
obtained from the data? This parameter is not specified for the other MLlib
algorithms.  

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Bayes-parameters-tp11592.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Naive Bayes parameters

2014-08-07 Thread SK
I followed the example in
examples/src/main/scala/org/apache/spark/examples/mllib/SparseNaiveBayes.scala.

IN this file Params is defined as follows: 

case class Params (
input: String = null,
minPartitions: Int = 0,
numFeatures: Int = -1,
lambda: Double = 1.0)

In the main function, the option parser accepts numFeatures as an option.
But I looked at the code in more detail just now and found the following:

  val model = new NaiveBayes().setLambda(params.lambda).run(training)

So looks like at the time of creation only the lambda parameter is used.
Perhaps the example needs to be cleaned up during the next release. I am
currently using Spark version 1.0.1.


thanks




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Bayes-parameters-tp11592p11623.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Naive Bayes parameters

2014-08-07 Thread Xiangrui Meng
It is used in data loading:

https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/SparseNaiveBayes.scala#L76

On Thu, Aug 7, 2014 at 12:47 AM, SK  wrote:
> I followed the example in
> examples/src/main/scala/org/apache/spark/examples/mllib/SparseNaiveBayes.scala.
>
> IN this file Params is defined as follows:
>
> case class Params (
> input: String = null,
> minPartitions: Int = 0,
> numFeatures: Int = -1,
> lambda: Double = 1.0)
>
> In the main function, the option parser accepts numFeatures as an option.
> But I looked at the code in more detail just now and found the following:
>
>   val model = new NaiveBayes().setLambda(params.lambda).run(training)
>
> So looks like at the time of creation only the lambda parameter is used.
> Perhaps the example needs to be cleaned up during the next release. I am
> currently using Spark version 1.0.1.
>
>
> thanks
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Bayes-parameters-tp11592p11623.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Naive Bayes parameters

2014-08-07 Thread SK
Ok, thanks for clarifying. So looks like numFeatures is only relevant for lib
SVM format. I am using LabeledPoint, so if data is not sparse, perhaps
numFeatures is not required. I thought that the  Params class defines all
the parameters passed to the ML algorithm. But it looks like it also
includes other options. Just as a suggestion - it may be useful to have a
separate class for just the algorithm parameters, so it is clear what can be
tuned. 

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Bayes-parameters-tp11592p11632.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org