[ https://issues.apache.org/jira/browse/HIVE-13496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siddharth Seth updated HIVE-13496: ---------------------------------- Attachment: HIVE-13496.01.patch Patch to make this change. This does the third point mentioned in the previous post. If the data does not exist, create it and copy it to a known location for future runs. If the data exists in the known location, copy it over for the current run. mvn clean gets rid of the cached data, in case it needs to be re-generated again. For "mvn test -Dtest=TestCliDriver -Dqfile="udf_md5.q"" Without patch: Run1: Total time: 1:09.271s Run2: Total time: 1:07.661s Run3: Total time: 1:09.281s With patch: Run1: Total time: 1:08.162s Run2: Total time: 18.754s Run3: Total time: 18.680s For Precommit tests, TestCliDriver runs 2131 tests - ~143 batches on 14 nodes - so an average 10 batches per node. Lookin at existing test results (specifically the mvn output against the test xml) - there's over a minute of data gen overhead on the build machines. Should take 10+ minutes off the runtime. Only done for TestCliDriver right now. I think we should get this change in (ideally without pre-commit), and then look at the other tests. [~ashutoshc], [~thejas] - could you please take a look. > Create initial test data once across multiple test runs > ------------------------------------------------------- > > Key: HIVE-13496 > URL: https://issues.apache.org/jira/browse/HIVE-13496 > Project: Hive > Issue Type: Sub-task > Components: Test > Reporter: Siddharth Seth > Assignee: Siddharth Seth > Attachments: HIVE-13496.01.patch > > > All TestCliDriver, TezMiniTezCliDriver etc tests create a standard data set > when they start up. When running on a box with SSDs - this step takes over a > minute. > Running a single qtest cannot be faster than this. On the ptest framework - > all batches end up doing this which is a lot of wastage. > Instead, this data generation should be shared across runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)