Rewrite Question A key thing that improves accuracy of naivebayes over text is the normalization over TF Vector (V)
new V_i = Log(1 + V_i) / SQRT(Sigma_k(V_k)); AbstractVector already does L_p norm, does it make sense to add one function to do the above normalization? Say logNormalize(double x). I will be adding this to PartialVector Merger (in DictionaryVectorizer). So two choices, I can do this in the Vectorizer or the Vectorizer can call this function ? Robin On Sat, Sep 25, 2010 at 10:22 PM, Sean Owen <[email protected]> wrote: > I think it's fine to do a rewrite at this stage. 0.5 sounds like a > nice goal. Just recall that aspects of this will be 'in print' soon so > yeah you want to a) plan to deprecate rather than remove the old code > for some time, b) make the existing code "forwards compatible" with > what you'll do next while you have the chance! > > On Sat, Sep 25, 2010 at 2:32 PM, Robin Anil <[email protected]> wrote: > > Hi, I was in the middle of changing the classifier over to to vectors and > I > > realized how radically it will change for people using it and how > difficult > > it is to fit the new interfaces ted checked it. There are many components > to > > it, including the Hbase stuff, which will take a lot of time to port. I > > think its best to start from scratch rewrite it, keeping the old version > so > > that it wont break for users using it?. If that is agreeable, I can > complete > > a new map/reduce + imemory classifier in o.a.m.c.naivebayes fitting the > > interfaces and deprecate the old bayes package?. The new package wont > have > > the full set of features as the old for 0.4 release. But it will be > > functional, and hopefully future proof. Let me know your thoughts > > > > Robin > > >
