Re: Crunch, Mahout, and HCatalog

Josh Wills Sun, 24 Mar 2013 13:25:47 -0700

On Sun, Mar 24, 2013 at 9:59 AM, Matthias Friedrich <[email protected]> wrote:


> On Friday, 2013-03-22, Josh Wills wrote:
> > I'm working on some tools for doing data integration and building machine
> > learning models w/Crunch, Mahout, and (soon!) HCatalog, and I wrote about
> > what I'm up to here:
> >
> > http://blog.cloudera.com/blog/2013/03/cloudera_ml_data_science_tools/
> >
> > and the code is here: https://github.com/cloudera/ml
>
> Cool thing, thanks for open sourcing it!
>
> [...]
> > Q: Why not do this as part of the Crunch or Mahout projects?
> > A: Dependency management. Crunch doesn't depend on Mahout, and Mahout
> > doesn't depend on Crunch, and I think that for the sanity of the
> developers
> > of both projects, it should stay that way. Dependency management is
> already
> > enough of a nightmare for Hadoop projects that I didn't want to do
> anything
> > to make it worse. I will contribute anything from the toolkit back to
> > Crunch that is deemed useful by the community (e.g., the reservoir
> sampling
> > stuff in CRUNCH-178) and doesn't introduce any new dependencies.
>
> This is really sad - but most probably the best decision for now. Do
> you happen to know if there is any work planned on the Hadoop side to
> clean up this situation?
>

Nothing that I'm aware of, but I copied Roman, who is more knowledgeable on
this topic than I am.


>
> Regards,
>   Matthias
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: Crunch, Mahout, and HCatalog

Reply via email to