[ 
https://issues.apache.org/jira/browse/HUDI-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-2267:
-----------------------------
    Fix Version/s: 0.11.0
                       (was: 0.10.0)

> Test suite infra Automate with playbook
> ---------------------------------------
>
>                 Key: HUDI-2267
>                 URL: https://issues.apache.org/jira/browse/HUDI-2267
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Usability
>            Reporter: sivabalan narayanan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.11.0
>
>
> Build a test infra (a suite of tests) that can be run w/ jenkins or CI 
> (optionally run it), and also scriptify to run in cluster/AWS infra.
> Purpose:
> There are lot of additional features in Hudi that does not get tested when 
> developing some new features. Some of the non-core features are clustering, 
> archival, bulk_insert row writer path etc don't get necessary attention while 
> developing a particular feature. So, we are in need of a test infra which one 
> can leverage. One should be able to trigger a script called certify_patch or 
> something and it should run all different tests that could one could possibly 
> hit out there in the wild and produce a result if all flows succeeded or if 
> anything failed.
> Operations to be verified:
> For both types of table:
> bulk insert, insert, upsert, delete, insert override, insert override table. 
> delete partition.
> bulk_insert row writer with above operations.
> Test cleaning and archival gets triggered and executed as expected for both 
> above flows.
> Clustering.
> Metadata table.
> For MOR:
> Compaction
> Clustering and compaction one after another.
> Clustering and compaction triggered concurrently.
> Note: For all tests, verify the sanity of data after every test. i.e. Save 
> the input data and verify w/ hudi dataset.
>  * Test infra should have capability to test with schema of user's choice.
>  * Should be able to test all 3 levels(write client, deltastreamer, spark 
> datasource). Some operations may not be feasible to test in all lavels, but 
> thats understandable.
>  * Once we have end to end support for spark, we need to add support for 
> flink and java as well. Scope for java might be less since there is no spark 
> datasource layer. But we can revisit later once we have covered spark engine.
> Publish a playbook on how to use this test infra. Both with an already 
> released version or by using a locally built hudi bundle jar.
>  * cluster/AWS run
>  * local docker run.
>  * CI integration
> Future scope:
> We can make versions of spark, hadoop, hive, etc configurable down the line. 
> but for first cut, wanted to get an end to end flow working smoothly. Should 
> be usable by anyone from the community or a new user who is looking to use 
> Hudi.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to