Dear community, Nice to share Hudi community updates for 2021-01-31 ~ 2021-02-28 with updates on features, bug fixes and tests.
======================================= Features [Core] Improve minKey/maxKey computation in HoodieHFileWriter [1] [Flink] Introduce FlinkHoodieSimpleIndex to hudi-flink-client [2] [Flink Integration] InstantGenerateOperator support multiple parallelism [3] [Flink Integration] Introduce FlinkHoodieBloomIndex to hudi-flink-client [4] [CLI] Adding commit_show_records_info to display record sizes for commit [5] [Flink Integration] Make Flink write pipeline write task scalable [6] [Spark Integration] Translate the api partitionBy in spark datasource to hoodie.datasource.write.partitionpath.field [7] [Flink Integration] Write as minor batches during one checkpoint interval for the new writer [8] [Spark Integration] Support Spark Structured Streaming read from Hudi table [9] [Flink Integration] Gets the parallelism from context when init StreamWriteOperatorCoordinator [10] [Core] Schedule compaction based on time elapsed [11] [Metaclient] Adding builder for HoodieTableMetaClient initialization [12] [Core] Remove inline inflight rollback in hoodie writer [13] [Flink Integration] Reduce the coupling of hadoop [14] [Flink Integration] The state based index should bootstrap from existing base files [15] [Java Client] Support copyOnWriteTable in java client [16] [Flink Integration] Avoid to rename for bucket update when there is only one flush action during a checkpoint [17] [Flink Integration] Some improvements to BucketAssignFunction [18] [DeltaStreamer] Make deltaStreamer transition from dfsSouce to kafkasouce [19] [Hive Integration] Make whether the failure of connect hive affects hudi ingest process configurable [20] [Metadata Table] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap [21] [1] https://issues.apache.org/jira/browse/HUDI-1519 [2] https://issues.apache.org/jira/browse/HUDI-1335 [3] https://issues.apache.org/jira/browse/HUDI-1511 [4] https://issues.apache.org/jira/browse/HUDI-1332 [5] https://issues.apache.org/jira/browse/HUDI-1571 [6] https://issues.apache.org/jira/browse/HUDI-1557 [7] https://issues.apache.org/jira/browse/HUDI-1526 [8] https://issues.apache.org/jira/browse/HUDI-1598 [9] https://issues.apache.org/jira/browse/HUDI-1109 [10] https://issues.apache.org/jira/browse/HUDI-1621 [11] https://issues.apache.org/jira/browse/HUDI-1381 [12] https://issues.apache.org/jira/browse/HUDI-1315 [13] https://issues.apache.org/jira/browse/HUDI-1486 [14] https://issues.apache.org/jira/browse/HUDI-1586 [15] https://issues.apache.org/jira/browse/HUDI-1624 [16] https://issues.apache.org/jira/browse/HUDI-1477 [17] https://issues.apache.org/jira/browse/HUDI-1637 [18] https://issues.apache.org/jira/browse/HUDI-1638 [19] https://issues.apache.org/jira/browse/HUDI-1367 [20] https://issues.apache.org/jira/browse/HUDI-1269 [21] https://issues.apache.org/jira/browse/HUDI-1611 ======================================= Bugs [Core] Honor ordering field for MOR Spark datasource reader [1] [Core] Call mkdir(partition) only if not exists [2] [Core] Try to init class trying different signatures instead of checking its name [3] [Core] IHoodieTableMetaClient.getMarkerFolderPath works incorrectly on windows client with hdfs server for wrong file seperator [4] [Core] Fix Rollback Metadata AVRO backwards incompatiblity [5] [Core] fix DefaultHoodieRecordPayload serialization failure [6] [Hive Integration] Throw an exception when syncHoodieTable() fails, with RuntimeException [7] [Core] Fix bug in HoodieCombineRealtimeRecordReader with reading empty iterators [8] [HBase Index] Fix Hbase index to make rollback synchronous (via config) [9] [1] https://issues.apache.org/jira/browse/HUDI-1550 [2] https://issues.apache.org/jira/browse/HUDI-1523 [3] https://issues.apache.org/jira/browse/HUDI-1538 [4] https://issues.apache.org/jira/browse/HUDI-1420 [5] https://issues.apache.org/jira/browse/HUDI-1589 [6] https://issues.apache.org/jira/browse/HUDI-1603 [7] https://issues.apache.org/jira/browse/HUDI-1582 [8] https://issues.apache.org/jira/browse/HUDI-1539 [9] https://issues.apache.org/jira/browse/HUDI-1347 ======================================= Tests [Tests] CI intermittent failure: TestJsonStringToHoodieRecordMapFunction [1] [Tests] Add test cases for INSERT_OVERWRITE Operation [2] [Tests] Fix write test flakiness in StreamWriteITCase [3] [CI] Add azure pipelines configs [4] [1] https://issues.apache.org/jira/browse/HUDI-1547 [2] https://issues.apache.org/jira/browse/HUDI-1545 [3] https://issues.apache.org/jira/browse/HUDI-1612 [4] https://issues.apache.org/jira/browse/HUDI-1620 Best, Leesf
