On 10 May 2012 17:38, Andrew Purtell <apurt...@apache.org> wrote:
> Regarding HDFS miniclusters, the interface is already limited-private > and there is no pressing need, but we do have test cases where we need > to simulate DataNode failures. Also, I can conceive of an application > unit test where I would want to set replication to 1 on some file, > then corrupt blocks, then check that repair (at the application level) > was successful. Would some limited public interface for that be > plausible? I'm going to weigh in as fan of MiniDFS and MiniMR clusters. -easiest way to spin up a basic Hadoop cluster -lets you test failure handling as well as functionality -lets you test things code that talks to DFS clusters remotely -lets you test topology code -very efficient for work that goes through a couple of hundred K records. It's the best Hadoop cluster to run on a laptop. Today's classes are very much designed for use within the Hadoop core and even there, use in test runs For example, they depend on system properties (build.test.data) to work ( https://issues.apache.org/jira/browse/HDFS-2209 ) ; pre-2.0 you need a factory that patches things in at construct time: http://smartfrog.svn.sf.net/viewvc/smartfrog/trunk/core/hadoop-components/grumpy/src/org/smartfrog/services/hadoop/grumpy/LocalDFSCluster.groovy?revision=8882&view=markup That example and the equivalent for MiniMR cluster (*) not only fix up the properties to work, they implement a getURI() method that returns the relevant URI of the service -filesystem and JT respectively, which I've found somewhat convenient. In 2.0, as well as MiniMR cluster going away, something changed in the HDFS interfaces that stopped my subclass from building -I think it was the accessibility or location of HdfsConstants. Whatever it was, it is making migration of my test code from 1.x to 2.x hard, which is discouraging me from testing against it -I can't have test setups that work on both branches. Then there's the fact that on 1.x at least, the Mini clusters are hidden in hadoop-test-x . jar, and it's not always been the case that this JAR has made it onto the Maven repositories. together then, these issues show that while MiniDFSCluster and the MR equivalents work for the core code, where accessibility, backwards compatibility and redistribution are non-issues, the classes aren't designed for downstream use -yet the number of people trying to use them, myself and Andrew included, shows that we want to. I would like to see something stable and public that could be used in this way. Stable classes in 1.x and 2.x that let your classes build on both platforms, classes don't need fixup before creation, and artifacts like hadoop-minicluster.jar that you can depend on without needing the rest of the tests. Oh. and a set of tests to verify cluster stability. +1 then to making mini clusters that downstream projects can use. Am I going to volunteer to do this? I could add it to my todo list, which means "not for a while". If someone else has a go, I'll promise to review it and commit it in if it's ready. -Steve (*) http://smartfrog.svn.sf.net/viewvc/smartfrog/trunk/core/hadoop-components/grumpy/src/org/smartfrog/services/hadoop/grumpy/LocalMRCluster.groovy?revision=8882&view=markup