On Tue, Mar 3, 2009 at 1:10 AM, Olga Natkovich <[email protected]> wrote:
> Hi Prasen, > > This looks great! How long did it take you to do the implementation? Took a lot less compared to what it took in implementing the algorithm in mapreduce java. Exactly thats the reason I wanted to try it out in Pig. It took ~ 3-4 days for me write it, startign from learning pig :) Having said that I feel it will be lot easier for developers to use pig if you allow some framework where I can write my UDFs in some interpreted language like jgroovy,python,perl,. It could even be pig as well. > > > As for performance, please, take a look at > http://wiki.apache.org/pig/PigUserCookbook for ideas on how to optimize > your queries. Also, we are currently working on efficient multiquery > support which I think your queries will benefit from. You can track the > project in https://issues.apache.org/jira/browse/PIG-627. We hope this > work to be completed by early Q2. > > Olga > > > -----Original Message----- > > From: prasenjit mukherjee [mailto:[email protected]] > > Sent: Monday, March 02, 2009 8:58 AM > > To: pig-user > > Subject: Hoffman's PLSI implementation in PIG > > > > Hi, > > I have implemented T. Hoffmann's PLSI based on EM > > algorithm in pig. The E/M login was implemented in pig in ~ > > 30-35 lines of pig-latin statements. > > The implementation is available in mahout as a part of the > > following patch : > > https://issues.apache.org/jira/browse/MAHOUT-106. > > > > Though the code works fine, would appreciate any feedback on > > the scalability aspects of the pig implementation, as there > > are some joins/cogroups used to compute the estimated > > probabilities of p(s|z) and p(z|u). > > > > -Prasen > > >
