[ https://issues.apache.org/jira/browse/HUDI-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Danny Chen updated HUDI-2267: ----------------------------- Fix Version/s: 0.11.0 (was: 0.10.0) > Test suite infra Automate with playbook > --------------------------------------- > > Key: HUDI-2267 > URL: https://issues.apache.org/jira/browse/HUDI-2267 > Project: Apache Hudi > Issue Type: Improvement > Components: Usability > Reporter: sivabalan narayanan > Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > Build a test infra (a suite of tests) that can be run w/ jenkins or CI > (optionally run it), and also scriptify to run in cluster/AWS infra. > Purpose: > There are lot of additional features in Hudi that does not get tested when > developing some new features. Some of the non-core features are clustering, > archival, bulk_insert row writer path etc don't get necessary attention while > developing a particular feature. So, we are in need of a test infra which one > can leverage. One should be able to trigger a script called certify_patch or > something and it should run all different tests that could one could possibly > hit out there in the wild and produce a result if all flows succeeded or if > anything failed. > Operations to be verified: > For both types of table: > bulk insert, insert, upsert, delete, insert override, insert override table. > delete partition. > bulk_insert row writer with above operations. > Test cleaning and archival gets triggered and executed as expected for both > above flows. > Clustering. > Metadata table. > For MOR: > Compaction > Clustering and compaction one after another. > Clustering and compaction triggered concurrently. > Note: For all tests, verify the sanity of data after every test. i.e. Save > the input data and verify w/ hudi dataset. > * Test infra should have capability to test with schema of user's choice. > * Should be able to test all 3 levels(write client, deltastreamer, spark > datasource). Some operations may not be feasible to test in all lavels, but > thats understandable. > * Once we have end to end support for spark, we need to add support for > flink and java as well. Scope for java might be less since there is no spark > datasource layer. But we can revisit later once we have covered spark engine. > Publish a playbook on how to use this test infra. Both with an already > released version or by using a locally built hudi bundle jar. > * cluster/AWS run > * local docker run. > * CI integration > Future scope: > We can make versions of spark, hadoop, hive, etc configurable down the line. > but for first cut, wanted to get an end to end flow working smoothly. Should > be usable by anyone from the community or a new user who is looking to use > Hudi. -- This message was sent by Atlassian Jira (v8.20.1#820001)