2014-06-11 18:16 GMT+02:00 Gavin Gray <[email protected]>:
> Yeah, you'd have to hand in a ve ctor listing which distribution to use for
> each element in the feature vector. Weka might have a way round this, but
> I'll have to try using it to see what the interface is like. They reference
> a paper that estimates the distribution of each feature using KDE:
> http://www.cs.iastate.edu/~jtian/cs573/Papers/John-UAI-95.pdf
>
> I guess then you wouldn't have to specify but it seems strange to try to
> estimate the distribution of a features you know is Bernoulli, for instance

Wow, kernel naive Bayes :)

But that paper doesn't solve the issue I raised: "Nominal attributes'
distributions are still learned by storing a single number per value
that represents the sample frequency" (page 3, bottom right). Domingos
[1] summarizes this algorithm as a better way than gaussian NB to
model real-valued inputs, not a way to combine real-valued and
categorical variables.

(Reminds me of the generative classification PR. Jake, are you following along?)

[1] http://www.cs.ucdavis.edu/~vemuri/classes/ecs271/Bayesian.pdf

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to