Hi all,

I'm trying to write my own code for NB classifier method, just so I
could use prior distributions other than for example gaussian. To
start with, I scripted something similar to GaussianNB function in
Scikit Learn (see the code below), but the two approaches give me
different result (means and variances agree, though). Can anybody see
why?

Btw, looking at the source code of Naive Bayes method
(./sklearn/naive_bayes.py), I see that there is a 2 missing in the
Gaussian equation (" n_ij = - 0.5 * np.sum(np.log(np.pi *
self.sigma_[i, :])) " ; np.pi should be multiplied by 2). Now, maybe
the code is correct and it is just me who doesn't understand the code.
That being said, the missing two doesn't solve my problem. Also, I
don't understand why doesn't the predict_proba method of BaseNB class
do any normalization of the class probabilities by sum of
probabilities over all classes?

My code:

-------------------------

                from scipy import stats
                import numpy


                # --- my GaussianNB code
                P0 = numpy.zeros(len(X_test)) + numpy.log(
len(X_learn0) / float(len(X_learn0)+len(X_learn1)) )
                P1 = numpy.zeros(len(X_test)) + numpy.log(
len(X_learn1) / float(len(X_learn0)+len(X_learn1)) )

                for y in xrange(numpy.shape(X_test)[1]):

                        dist1 =
stats.norm(loc=numpy.mean(X_learn1[:,y]),
scale=numpy.std(X_learn1[:,y]))
                        dist0 =
stats.norm(loc=numpy.mean(X_learn0[:,y]),
scale=numpy.std(X_learn0[:,y]))

                        P1 += numpy.log(dist1.pdf(X_test[:,y]))
                        P0 += numpy.log(dist0.pdf(X_test[:,y]))

                P0 = numpy.exp(P0)
                P1 = numpy.exp(P1)

                P0 = P0 / (P0+P1)
                P1 = P1 / (P0+P1)

                fpr_nb, tpr_nb, thresholds = roc_curve(Y_test, P1)
                roc_auc_nb = auc(fpr_nb, tpr_nb)
                print pdb+" Area under the ROC curve for my NB: %f" % roc_auc_nb


                # --- learn Naive Bayes using Scikit code
                nb_learner = GaussianNB()
                nb_learner.fit(numpy.concatenate([X_learn0,X_learn1]),
[0]*len(X_learn0)+[1]*len(X_learn1))

fpr_nb,tpr_nb,thresholds=roc_curve(Y_test,nb_learner.predict_proba(X_test)[:,1])
                roc_auc_nb = auc(fpr_nb, tpr_nb)
                print pdb+" Area under the ROC curve for SL NB: %f" % roc_auc_nb

---------------------------------------------

Best,

Will

------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to