On Sun, Mar 24, 2013 at 9:59 AM, Matthias Friedrich <[email protected]> wrote:
> On Friday, 2013-03-22, Josh Wills wrote: > > I'm working on some tools for doing data integration and building machine > > learning models w/Crunch, Mahout, and (soon!) HCatalog, and I wrote about > > what I'm up to here: > > > > http://blog.cloudera.com/blog/2013/03/cloudera_ml_data_science_tools/ > > > > and the code is here: https://github.com/cloudera/ml > > Cool thing, thanks for open sourcing it! > > [...] > > Q: Why not do this as part of the Crunch or Mahout projects? > > A: Dependency management. Crunch doesn't depend on Mahout, and Mahout > > doesn't depend on Crunch, and I think that for the sanity of the > developers > > of both projects, it should stay that way. Dependency management is > already > > enough of a nightmare for Hadoop projects that I didn't want to do > anything > > to make it worse. I will contribute anything from the toolkit back to > > Crunch that is deemed useful by the community (e.g., the reservoir > sampling > > stuff in CRUNCH-178) and doesn't introduce any new dependencies. > > This is really sad - but most probably the best decision for now. Do > you happen to know if there is any work planned on the Hadoop side to > clean up this situation? > Nothing that I'm aware of, but I copied Roman, who is more knowledgeable on this topic than I am. > > Regards, > Matthias > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
