Hello, It's quite noticeable that testing hadoop-hdfs and hadoop-mapreduce (0.23/1.0/2.0) takes a lot of time which has number of obvious downsides. Me and my team are trying to analyze the reasons and identify possible improvements, and in particular we noticed that during last years there were a number of attempts to optimize and speed up HDFS/MR junit tests, namely:
1. Introducing unit test framework A number of pure unit tests (mock-based, non-integration) were added, see HDFS-669, MAPREDUCE-1050, HADOOP-6423. However, it seems that these tests are not separated from integration tests (MiniCluster-based), some of them were moved to the hadoop-hdfs/src/tests/unit and hadoop-mapreduce-project/src/test/unit directories and disabled in mavenized builds starting from 0.23. There was an attempt to fix this in HDFS-2276, but it's still unresolved. 2. Smoke tests (10 minutes test target) There was a successful initiative on selecting a subset of tests in HDFS and MapReduce modules to be used as smoke tests with running time < 10 minutes. The tests were chosen manually, with the condition of having large code coverage in the most important packages/classes. This was done prior to 0.23/2.0, in Ant builds, see HADOOP-5628, HDFS-458, MAPREDUCE-670. Apparently, mavenized builds do not use this feature. 3. Separating tests into categories. HADOOP-6399 - open since 2009. In general, separating tests into categories, having fast true unit tests additionally to great coverage by integration/component tests Hadoop has now, and then sets of capacity/availability tests -- those things would help to make Hadoop more stable, development and release process less painful etc. So would it be useful to do some cleaning, stabilizing and enhancing existing unit/integration tests, assemble a suite of pure unit tests and short-running integration tests, having coverage measured for all three sets (unit, smoke, full). Is it worth pursuing this? What's the best place to start? Is it worth completing the items 1 and 2 mentioned above? Any comments or hints would be really appreciated. -- Andrey Klochkov