Dear community, Nice to share Hudi community bi-weekly updates for 2022-02-23 ~ 2022-02-27 with updates on features, bug fixes.
======================================= Features [Spark] Introduce HoodieCatalog to manage tables for Spark Datasource V2 [1] [Core] Added new cleaning policy based on number of hours [2] [Spark] upgrade spark to 3.2.1 [3] [Spark] Add Call Produce Command for Spark SQL [4] [Spark] Support Metadata Table in Spark Datasource [5] [1] https://issues.apache.org/jira/browse/HUDI-3254 [2] https://issues.apache.org/jira/browse/HUDI-349 [3] https://issues.apache.org/jira/browse/HUDI-3432 [4] https://issues.apache.org/jira/browse/HUDI-3161 [5] https://issues.apache.org/jira/browse/HUDI-1296 ======================================= Bugs [Core] fix Sql source's checkpoint issue [1] [Core] The files recorded in the commit may not match the actual ones for MOR Compaction [2] [Core] TypedProperties no need to create new set when check key exist or not [3] [Core] If mode==ignore && tableExists, do not execute write logic and sync hive [4] [Core] Fix the build on aarch64, Fedora 33 [5] [Core] Fix TableSchemaResolver for all file formats and metadata table[6] [Core] deprecate hoodie.file.index.enable and unify to use BaseFileOnlyViewRelation to handle [7] [Core] Make archiving an async service [8] [Core] fix problem that spark on TimestampKeyGenerator has no result when query by partition column [9] [Core] Add config to disable table services [10] [Core] Remove hardcoded logic of disabling metadata table in tests [11] [Core] Cleaning up Hive-related hierarchies after refactoring [12] [Core] Sync datasource clustering config [13] [Core] Introduce a checksum mechanism for validating hoodie.properties [14] [Deltastreamer] Fix Deltastreamer to properly shut down the services upon failure [15] [Core] Avoid getSmallFiles if hoodie.parquet.small.file.limit is 0 [16] [Spark] fix ColumnarArrayData ClassCastException issue [17] [Flink] Supports batch reader in BootstrapOperator#loadRecords [18] [Core] Fix BulkInsertPartitioner generic type [19] [Core] Retry FileSystem action instead of failed directly [20] [Core] Fixing restore with metadata enabled [21] [Core] Fixing checkpoint management in hoodie incr source [22] [Core] Abstract Spark update Strategy to make code more clean and remove duplicates [23] [Core] Fix duplicate cleaning of same files when unfinished clean operations are present using a config [24] [Deltastreamer] Adding delete partitions support to DeltaStreamer [25] [Flink] The archived timeline for flink streaming reader should not be reused [26] [Core] Fix wrong field order for constructing HoodieMetadataColumnStats [27] [Core] The flink small file list should exclude file slices with pending compaction [28] [Core] Not table to get execution plan [29] [Core] fix NPE caused by incorrect beforeKeyGenClassName validation [30] [Flink] Add more document to Pipelines for the usage of this tool to build a write pipeline [31] [Core] Pending clustering may break AbstractTableFileSystemView#getxxBaseFile() [32] [Core] Refactor clustering executors [33] [Core] Making rdd unpersist optional at the end of writes [34] [1] https://issues.apache.org/jira/browse/HUDI-2413 [2] https://issues.apache.org/jira/browse/HUDI-3370 [3] https://issues.apache.org/jira/browse/HUDI-3412 [4] https://issues.apache.org/jira/browse/HUDI-3272 [5] https://issues.apache.org/jira/browse/HUDI-1657 [6] https://issues.apache.org/jira/browse/HUDI-3398 [7] https://issues.apache.org/jira/browse/HUDI-3200 [8] https://issues.apache.org/jira/browse/HUDI-1576 [9] https://issues.apache.org/jira/browse/HUDI-3204 [10] https://issues.apache.org/jira/browse/HUDI-2931 [11] https://issues.apache.org/jira/browse/HUDI-3394 [12] https://issues.apache.org/jira/browse/HUDI-3280 [13] https://issues.apache.org/jira/browse/HUDI-3426 [14] https://issues.apache.org/jira/browse/HUDI-2809 [15] https://issues.apache.org/jira/browse/HUDI-3430 [16] https://issues.apache.org/jira/browse/HUDI-3438 [17] https://issues.apache.org/jira/browse/HUDI-3389 [18] https://issues.apache.org/jira/browse/HUDI-3446 [19] https://issues.apache.org/jira/browse/HUDI-3458 [20] https://issues.apache.org/jira/browse/HUDI-2648 [21] https://issues.apache.org/jira/browse/HUDI-3432 [22] https://issues.apache.org/jira/browse/HUDI-3455 [23] https://issues.apache.org/jira/browse/HUDI-3042 [24] https://issues.apache.org/jira/browse/HUDI-2925 [25] https://issues.apache.org/jira/browse/HUDI-2189 [26] https://issues.apache.org/jira/browse/HUDI-3461 [27] https://issues.apache.org/jira/browse/HUDI-3486 [28] https://issues.apache.org/jira/browse/HUDI-3488 [29] https://issues.apache.org/jira/browse/HUDI-3494 [30] https://issues.apache.org/jira/browse/HUDI-3401 [31] https://issues.apache.org/jira/browse/HUDI-3474 [32] https://issues.apache.org/jira/browse/HUDI-3421 [33] https://issues.apache.org/jira/browse/HUDI-3042 [34] https://issues.apache.org/jira/browse/HUDI-3515 =================================== Tests [Tests] Remove hardcoded logic of disabling metadata table in tests [1] [Tests] Enchancements to integ test suite [2] [Tests] Support clustering scheduleAndExecute for hudi-cli and add clustering-cli Tests [3] [1] https://issues.apache.org/jira/browse/HUDI-3366 [2] https://issues.apache.org/jira/browse/HUDI-3480 [3] https://issues.apache.org/jira/browse/HUDI-3429 Best, Leesf
