[ https://issues.apache.org/jira/browse/HUDI-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu updated HUDI-3469: ----------------------------- Status: Patch Available (was: In Progress) > Refactor HoodieTestDataGenerator to enable reproducible builds > -------------------------------------------------------------- > > Key: HUDI-3469 > URL: https://issues.apache.org/jira/browse/HUDI-3469 > Project: Apache Hudi > Issue Type: Bug > Reporter: Alexey Kudinkin > Assignee: Alexey Kudinkin > Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > Currently, `HoodieTestDataGenerator` relies on static state which make its > state shared across all of the tests making data generation dependent on the > order of execution. > > Instead we should properly abstract `HoodieTestDataGenerator` to hold all of > the state w/in individual instances so that individual Tests can: > 1. Create they own isolated instance (which won't be affected by other Tests) > 2. Pass "seed" value to DataGenerator to init its PRNG w/ it, so that it > always produces the same (pseudo-)random sequence (for a given seed) > 3. Be certain that all of the data produced by DataGenerator will be 100% > reproducible w/ the same seed (meaning that all of the DataGenerator > operations w/in it only rely on such internal PRNG and don't rely on any > external sources, such as `UUID.randomUUID()`, `System.currentTimeMillis()`, > etc) -- This message was sent by Atlassian Jira (v8.20.1#820001)