Sahil Takiar created HIVE-19784:
-----------------------------------

             Summary: Regression test selection framework for ptest
                 Key: HIVE-19784
                 URL: https://issues.apache.org/jira/browse/HIVE-19784
             Project: Hive
          Issue Type: Sub-task
          Components: Testing Infrastructure
            Reporter: Sahil Takiar


Regression test selection is a methodology for decreasing the number of tests 
that are run in regression test suites. The idea is to that for a given change, 
only run the tests that are relevant to the given change, rather than all the 
tests.

For example, right now Hive QA runs all the {{standalone-metastore}} tests for 
every patch. However, most of the time this isn't necessary. If a patch is only 
modifying files in {{ql}} or {{common}} there is no need to run 
{{standalone-metastore}} tests as there is no dependency from the 
{{standalone-metastore}} to any other Hive module (exception for 
{{storage-api}}).

RTS is commonly used for CI systems. Google has published some interesting info 
on how they do this
* 
http://google-engtools.blogspot.com/2011/06/testing-at-speed-and-scale-of-google.html
* https://drive.google.com/file/d/0Bx-FLr0Egz9zYXJfMEZ6NERTbkU/view
* [Bazelhttps://bazel.build/] seems to provide some functionality to do this: 
http://code.hootsuite.com/faster-automated-tests-bazel/

There are a few other open-source projects that offer different ways of doing 
this: [Ekstazi|http://ekstazi.org/]

A short term solution would be to implement the following:
* Before each Hive QA, parse the Maven dependency graph
* Take the specified patch and check which Maven modules it modifies
* Runs tests contained inside the modified modules and their dependent modules



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to