HiveQA is taking too long to run. I browsed test results a bit, the main offenders are obviously various CliDrivers. I think there’s a JIRA to speed up Tez CLI driver that is being worked on; and Spark and HBase have tolerable runtimes.
That leaves us base CliDriver and MR. Base tests generally take 0-30seconds, 1-2 minutes at most, but there are some ridiculous test runtimes (these are fairly consistent between runs): testCliDriver_rcfile_merge1 31 min testCliDriver_escape2 13 min testCliDriver_escape1 8 min 10 sec testCliDriver_dynpart_sort_opt_vectorization 4 min 47 sec testCliDriver_unionDistinct_1 4 min 32 sec testCliDriver_dynpart_sort_optimization 4 min 2 sec testCliDriver_rcfile_merge2 3 min 55 sec testCliDriver_vector_leftsemi_mapjoin 3 min 53 sec testCliDriver_archive_excludeHadoop20 3 min 13 sec If we remove or rein in 3 tests the testCliDriver runtime will go down by almost an hour. Anyone particularly attached to rcfile tests? It’s all good to test rcfile, but it’s a rarely use format with Avro, ORC and Parquet seemingly having taken over (not speaking of Text), the test should not take half an hour. I suggest we disable this test (rcfile_merge1) and file a JIRA to investigate its perf if someone feels it’s important. Another work item is to look at why escape tests take so long, it should be a simple thing to test, not 21 minutes aggregate (most test finish in 0-2 minutes). Then, MiniMR test takes 2 hours. Some GBY index specific tests are the worst offenders (gbtoidx), to the tune of 35mins for 3 tests; as well as smb_mapjoin for 15mins. Since the plan was to drop MR support on master, how about starting by not running these long MR tests and deprecating MR engine, while still keeping it around before the task of removing it.