On 24 September 2012 14:41, Marcos Ortiz <mlor...@uci.cu> wrote: > > On 09/24/2012 06:29 AM, Christian Schäfer wrote: > >> I think a good starting point for that distribution guide would be a >> feature matrix where all reasonable distributions could be compaired. >> > +1 for this idea > I think that this feature matrix will be on the Hadoop wiki. > > gets too controversial
I wouldn't be completely dismissive of Apache 1.0.3; it went through the large cluster QA by the QA team at hortonworks (disclaimer: my colleagues) ; the 1.x branch is going to be long-lived and is in use in production. > >> >> There could be metrics for cross cutting concerns like performance, >> security, etc. referring to real benchmarks. >> Upon this one could derive (maybe by additional explainations) which >> distribution fits in a certain use case the best. >> > Umm, this is tricky, How we can decide which is the best fit for a certain > type of problem? > My suggestion is to avoid this, because this will bring some hot > discussions and that´s not the idea. > It´s my personal opinion. > What would be good would be more traces of real-world cluster use, stuff that can be fed into the gridmix 3 benchmarker [ http://developer.yahoo.com/blogs/hadoop/posts/2010/04/gridmix3_emulating_production/]. If your workload gets pulled into the performance tests used by the Hadoop development teams. .