On Tue, Mar 24, 2009 at 4:15 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > This sounds fantastic. > > I think that your scala code is interesting, but your thoughts on LDA are > much more so. I tried doing a similar simplification of map-reduce program > writing using groovy and found that in spite of even smaller programs than > you quote for word-count, that the benefits in practice were relatively > small. Using Pig was much more productive, even with the lack of any real > programming language.
Thanks! I agree that SMR isn't there yet, and it really isn't a Mahout thing. I could get closer to the Groovy line count, but my main goal was to remove all the boiler plate associated with Hadoop (Text,IntWritable,Mapper/Reducer) and to get closer to the real program logic. You are right that Pig is usually more useful for many tasks, and one of my plans is to duplicate some of its functionality, though I actually think I prefer Dryad/LINQ's kind of syntax. > > It would also be interesting to see how you might attack semi-supervised > multi-task learning using a well-founded Bayesian approach. For a > non-Bayesian example with impressive results, see Ronan Collobert's paper: > http://ronan.collobert.com/pub/2008_nlp_icml.html Interesting. I'll take a closer look at this this evening. -- David > > On Tue, Mar 24, 2009 at 12:26 AM, David Hall <d...@cs.stanford.edu> wrote: > >> This summer, I'd like to help contribute to the Mahout project. I read >> Tijs Zwinkels' proposal, and I think that what I would like to work on >> is sufficiently different from what he would like to do. First, I >> would like to implement Latent Dirichilet Allocation, a popular topic >> mixture model that learns both document clusters and word clusters. I >> would then like to extend it to implement a number of general purpose >> topic models, including Topics over Time, Pachinko Allocation, and >> possibly Supervised Topic Models. >> > > > > -- > Ted Dunning, CTO > DeepDyve > > 111 West Evelyn Ave. Ste. 202 > Sunnyvale, CA 94086 > www.deepdyve.com > 408-773-0110 ext. 738 > 858-414-0013 (m) > 408-773-0220 (fax) >