On Thu, Oct 21, 2010 at 05:53PM, Ian Holsman wrote: > In discussing it with people, I've heard that a major issue (not the only > one i'm sure) is lack of resources to actually test the apache releases on > large clusters, and that it is very hard getting this done in short cycles > (hence the large gap between 20.x and 21).
I do agree the lack of resources for testing Hadoop is a problem. However, there might be some slight difference in the meaning of word 'resources' ;) The only way, IMO, to have a reasonable testing done on a system as complex as Hadoop is to invest into automatic validation of builds at system level. This requires a few things (resources, if you will): - extra hardware (the easiest and cheapest problem) - automatic deployment, testing, and analysis - system tests development which able to control and observe a cluster behavior (in other words something more sophisticated than just shell scripts) And for the semi-adequate system testing you don't need a large cluster: 10-20 nodes will be sufficient in most cases. But the automation of all the processes starting from deployment is the key. Testing automation is in a little better shape for Hadoop has that system test framework called Herriot (part of Hadoop code base for about 7 months now), but it still needs further extending. Hopefully this briefs you about the cluster testing side of the issue. Cos > So I thought I would start the thread to see if we could at least identify > what the people think are the problems are. > > > On Thu, Oct 21, 2010 at 3:30 PM, Allen Wittenauer > <awittena...@linkedin.com>wrote: > > > > > On Oct 21, 2010, at 12:13 PM, Ian Holsman wrote: > > > > > Hi guys. > > > > > > I wanted to start a conversation about how we could merge the the > > cloudera + > > > yahoo distribtutions of hadoop into our codebase, > > > and what would be required. > > > > > > *grabs popcorn* > > > >
signature.asc
Description: Digital signature