Thanks for the tip about numerical accuracy issues and the elegant solution exploiting log/exp. It is very much appreciated.
Sergey On Fri, May 3, 2013 at 11:42 AM, Kai Londenberg < kai.londenb...@googlemail.com> wrote: > Hi, > > Just a hint: It's usually better to work with log probabilites and sum > over them, than to work with raw probabilities and to use > multiplication. You might easily run into numerical accuracy issues > otherwise. > > i.e. exploit this fact: > > product(x1, ..., xn) = exp(sum(log(x1), ..., log(xn))) > > best, > > Kai Londenberg > > 2013/5/3 Sergey Goder <sergeygo...@gmail.com>: > > I'm creating a multinomial naive bayes classifier using pig and need to > > compute the product of probabilities. There are an arbitrary number of > > values in the bag so I would like to be able to use a function similar to > > the builtin SUM to do this. I looked through the source code and found > that > > with some really simple changes to SUM.java I can create a PROD.java > > function. I included it in my piggybank and have been using it > successfully. > > > > I was curious what the community thought about including this function > as a > > builtin function in a future release? Or would it make more sense to keep > > this function as a udf in a piggybank. > > > > Thanks, > > Sergey >