[ https://issues.apache.org/jira/browse/HUDI-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128778#comment-17128778 ]
Raymond Xu edited comment on HUDI-781 at 6/9/20, 2:41 AM: ---------------------------------------------------------- [~yanghua] [~vinoth] [~nishith29] [~garyli1019] Here is an execution plan of the subtasks * To begin with, I'm trying to finish subtask #1 as it can be a quick win. As shown in [https://github.com/apache/hudi/pull/1619#issuecomment-627610722,] we can reduce CI time by 10+ min by simply split the test tasks * In parallel we can start #3. The proposed `hudi-testutils` module is to encompass all `testutils` from each module, which makes the test dependencies clearer. It will clean up some misplaced tests found during package restructure. ** org.apache.hudi.execution.TestBoundedInMemoryQueue in `hudi-client` should be put in `hudi-common` (misplaced due to client test harness dependency) ** org.apache.hudi.utilities.inline.fs.TestParquetInLining in `hudi-utilities` should be put in `hudi-common` (misplaced due to data generator dependency) * Once a minimum setup of `hudi-testutils` is done, we can start #4 ** Implement a shared spark session provider there ** Use the shared spark session provider for test suites, which group functional tests with similar setup/teardown logic (may need to figure out Junit 5 version of Junit 4 test suites with Rule / ClassRule ) ** By using the new provider class on functional tests one by one, we should start observing reduced test time of hudi-client module or others * #2 and #5 can be done in parallel Each subtask has its own detailed points in its ticket. Please review this rough plan and feedback accordingly. Thanks! was (Author: rxu): [~yanghua] [~vinoth] [~nishith29] [~garyli1019] Here is an execution plan of the subtasks * To begin with, I'm trying to finish subtask #1 as it can be a quick win. As shown in [https://github.com/apache/hudi/pull/1619#issuecomment-627610722,] we can reduce CI time by 10+ min by simply split the test tasks * In parallel we can start #3. The proposed `hudi-testutils` module is to encompass all `testutils` from each module, which makes the test dependencies clearer. It will clean up some misplaced tests found during package restructure. ** org.apache.hudi.execution.TestBoundedInMemoryQueue in `hudi-client` should be put in `hudi-common` (due to client test harness dependency) ** org.apache.hudi.utilities.inline.fs.TestParquetInLining in `hudi-utilities` should be put in `hudi-common` (due to data generator dependency) * Once a minimum setup of `hudi-testutils` is done, we can start #4 ** Implement a shared spark session provider there ** Use the shared spark session provider for test suites, which group functional tests with similar setup/teardown logic (may need to figure out Junit 5 version of Junit 4 test suites with Rule / ClassRule ) ** By using the new provider class on functional tests one by one, we should start observing reduced test time of hudi-client module or others * #2 and #5 can be done in parallel Each subtask has its own detailed points in its ticket. Please review this rough plan and feedback accordingly. Thanks! > Re-design test utilities > ------------------------ > > Key: HUDI-781 > URL: https://issues.apache.org/jira/browse/HUDI-781 > Project: Apache Hudi > Issue Type: Test > Components: Testing > Reporter: Raymond Xu > Priority: Major > > Test utility classes are to re-designed with considerations like > * Use more mockings > * Reduce spark context setup > * Improve/clean up data generator > An RFC would be preferred for illustrating the design work. -- This message was sent by Atlassian Jira (v8.3.4#803005)