HiveQA is taking too long to run.
I browsed test results a bit, the main offenders are obviously various
CliDrivers.
I think there’s a JIRA to speed up Tez CLI driver that is being worked on;
and Spark and HBase have tolerable runtimes.

That leaves us base CliDriver and MR.
Base tests generally take 0-30seconds, 1-2 minutes at most, but there are
some ridiculous test runtimes (these are fairly consistent between runs):
testCliDriver_rcfile_merge1                  31 min
testCliDriver_escape2                  13 min
testCliDriver_escape1                  8 min 10 sec
testCliDriver_dynpart_sort_opt_vectorization                  4 min 47 sec
testCliDriver_unionDistinct_1                  4 min 32 sec
testCliDriver_dynpart_sort_optimization                  4 min 2 sec
testCliDriver_rcfile_merge2                  3 min 55 sec
testCliDriver_vector_leftsemi_mapjoin                  3 min 53 sec
testCliDriver_archive_excludeHadoop20                  3 min 13 sec


If we remove or rein in 3 tests the testCliDriver runtime will go down by
almost an hour.
Anyone particularly attached to rcfile tests? It’s all good to test
rcfile, but it’s a rarely use format with Avro, ORC and Parquet seemingly
having taken over (not speaking of Text), the test should not take half an
hour. I suggest we disable this test (rcfile_merge1) and file a JIRA to
investigate its perf if someone feels it’s important.
Another work item is to look at why escape tests take so long, it should
be a simple thing to test, not 21 minutes aggregate (most test finish in
0-2 minutes).

Then, MiniMR test takes 2 hours. Some GBY index specific tests are the
worst offenders (gbtoidx), to the tune of 35mins for 3 tests; as well as
smb_mapjoin for 15mins.
Since the plan was to drop MR support on master, how about starting by not
running these long MR tests and deprecating MR engine, while still keeping
it around before the task of removing it.

Reply via email to