Re: Hoffman's PLSI implementation in PIG

prasenjit mukherjee Mon, 02 Mar 2009 19:00:28 -0800

On Tue, Mar 3, 2009 at 1:10 AM, Olga Natkovich <[email protected]> wrote:


> Hi Prasen,
>
> This looks great! How long did it take you to do the implementation?


Took a lot less  compared to what it took in implementing the algorithm in
mapreduce java. Exactly thats the reason I wanted to try it out in Pig. It
took ~ 3-4 days for me write it, startign from learning pig :)

Having said that I feel it will be lot easier for developers to use pig if
you allow some framework where I can write my UDFs in some interpreted
language like jgroovy,python,perl,. It could even be pig as well.


>
>
> As for performance, please, take a look at
> http://wiki.apache.org/pig/PigUserCookbook for ideas on how to optimize
> your queries. Also, we are currently working on efficient multiquery
> support which I think your queries will benefit from. You can track the
> project in https://issues.apache.org/jira/browse/PIG-627. We hope this
> work to be completed by early Q2.
>
> Olga
>
> > -----Original Message-----
> > From: prasenjit mukherjee [mailto:[email protected]]
> > Sent: Monday, March 02, 2009 8:58 AM
> > To: pig-user
> > Subject: Hoffman's PLSI implementation in PIG
> >
> > Hi,
> >    I have implemented T. Hoffmann's PLSI based on EM
> > algorithm in pig. The E/M login was implemented in pig in ~
> > 30-35 lines of pig-latin statements.
> > The implementation is available in mahout as a part of the
> > following patch :
> > https://issues.apache.org/jira/browse/MAHOUT-106.
> >
> > Though the code works fine, would appreciate any feedback on
> > the scalability aspects of the pig implementation, as there
> > are some joins/cogroups used to compute the estimated
> > probabilities of p(s|z) and p(z|u).
> >
> > -Prasen
> >
>

Re: Hoffman's PLSI implementation in PIG

Reply via email to