Dear community, Nice to share Hudi community bi-weekly updates for 2021-11-07 ~ 2021-11-21 with updates on features, bug fixes and tests.
======================================= Features [Spark SQL] Add ORC support in Bootstrap Op [1] [Core] Support records staying in same fileId after clustering [2] [Flink Integration] Support scheduling online compaction plan when there are no commit data [3] [Core] Add support for DynamoDb based lock provider [4] [Core] InLineFS support for S3FS logs [5] [Spark] Add external config file support [6] [Spark SQL] Virtual keys support for metadata table [7] [Core] Added S3 object filter to support multiple S3EventsHoodieIncrSources single S3 meta table [8] [Core] Add mechanism to safely update,delete and recover table properties [9] [1] https://issues.apache.org/jira/browse/HUDI-1827 [2] https://issues.apache.org/jira/browse/HUDI-1877 [3] https://issues.apache.org/jira/browse/HUDI-2685 [4] https://issues.apache.org/jira/browse/HUDI-2314 [5] https://issues.apache.org/jira/browse/HUDI-2716 [6] https://issues.apache.org/jira/browse/HUDI-2362 [7] https://issues.apache.org/jira/browse/HUDI-2593 [8] https://issues.apache.org/jira/browse/HUDI-2472 [9] https://issues.apache.org/jira/browse/HUDI-2795 ======================================= Bugs [Flink] Set up keygen class explicit for write config for flink table upgrade [1] [Core] bugfix: NPE when select count start from a realtime table with Tez [2] [Flink] Add more options when initializing table [3] [Flink] Remove the table source options validation [4] [Flink] Fixing metadata table updates such that only regular writes from data table can trigger table services in metadata table [5] [Core] The BitCaskDiskMap iterator may cause memory leak [6] [Core] Bootstrap metadata table only if upgrade / downgrade is not required. [7] [Deltastreamer] Make deltastreamer checkpoint state merging more explicit [8] [Core] Estimate available memory size for spillable map accurately [9] [Hive Integration] redo the logical of mor_incremental_view for hive [10] [Core] Change default values for certin clustering configs [11] [Core] Move EventTimeAvroPayload into hudi-common module [12] [Core] Improved the metadata table bootstrap for very large tables [13] [Core] Resolve inconsistent key generation for timestamp types by GenericRecord and Row [14] [Flink Integration] Remove the bucketAssignFunction useless context [15] [Flink Integration] Do not bootstrap for flink insert overwrite [16] [Core] Part1 Setting default parallelism to 200 for some of write configs [17] [Core] ExternalSpillableMap payload size re-estimation throws ArithmeticException [18] [Core] Fixing instantiating metadata table config in HoodieFileIndex [19] [Flink Integration] Fix flink parquet writer decimal type conversion [20] [Spark SQL] refactor spark-sql to make consistent with DataFrame api [21] [Core] Fix parsing of metadadata table compaction timestamp when metrics are enabled [22] [Core] Parallelize deleting archived hoodie commits [23] [Core] Fixing a bug with rollback of partially failed commit which has new partitions [24] [Flink Integration] Fix StreamerUtil#medianInstantTime for very near instant time [25] [Core] Ensure list based rollback strategy is used for restore [26] [Core] Part3 Enabling marker based rollback as default rollback strategy [27] [Core] Setting default metadata enable as false for Java [28] [Flink Integration] Flink batch upsert for non partitioned table does not work [29] [Flink Integration] Fix the changelog mode of HoodieTableSource [30] [Core] Avoid deleting all inflight commits heartbeats while rolling back failed writes [31] [Core] Allows duplicate files for metadata commit [32] [Flink Integration] Fix flink query operation fields [33] [Core] Make clustering work regardless of whether there are base file [34] [Core] Metadata table support for Restore action to first commit [35] [Core] Add configuration inference logic for few options [36] [Flink Integration] Add option to skip compaction instants for streaming read [37] [Flink Integration] Make flink parquet reader compatible with decimal BINARY encoding [38] [Hive Integration] Update Hive sync timestamp when change detected [39] [1] https://issues.apache.org/jira/browse/HUDI-2702 [2] https://issues.apache.org/jira/browse/HUDI-313 [3] https://issues.apache.org/jira/browse/HUDI-2709 [4] https://issues.apache.org/jira/browse/HUDI-2698 [5] https://issues.apache.org/jira/browse/HUDI-2595 [6] https://issues.apache.org/jira/browse/HUDI-2715 [7] https://issues.apache.org/jira/browse/HUDI-2591 [8] https://issues.apache.org/jira/browse/HUDI-2579 [9] https://issues.apache.org/jira/browse/HUDI-2297 [10] https://issues.apache.org/jira/browse/HUDI-2086 [11] https://issues.apache.org/jira/browse/HUDI-2442 [12] https://issues.apache.org/jira/browse/HUDI-2730 [13] https://issues.apache.org/jira/browse/HUDI-2634 [14] https://issues.apache.org/jira/browse/HUDI-2495 [15] https://issues.apache.org/jira/browse/HUDI-2738 [16] https://issues.apache.org/jira/browse/HUDI-2746 [17] https://issues.apache.org/jira/browse/HUDI-2151 [18] https://issues.apache.org/jira/browse/HUDI-2718 [19] https://issues.apache.org/jira/browse/HUDI-2741 [20] https://issues.apache.org/jira/browse/HUDI-2756 [21] https://issues.apache.org/jira/browse/HUDI-2706 [22] https://issues.apache.org/jira/browse/HUDI-2744 [23] https://issues.apache.org/jira/browse/HUDI-2683 [24] https://issues.apache.org/jira/browse/HUDI-2712 [25] https://issues.apache.org/jira/browse/HUDI-2769 [26] https://issues.apache.org/jira/browse/HUDI-2753 [27] https://issues.apache.org/jira/browse/HUDI-2151 [28] https://issues.apache.org/jira/browse/HUDI-2734 [29] https://issues.apache.org/jira/browse/HUDI-2789 [30] https://issues.apache.org/jira/browse/HUDI-2790 [31] https://issues.apache.org/jira/browse/HUDI-2641 [32] https://issues.apache.org/jira/browse/HUDI-2791 [33] https://issues.apache.org/jira/browse/HUDI-2798 [34] https://issues.apache.org/jira/browse/HUDI-2731 [35] https://issues.apache.org/jira/browse/HUDI-2796 [36] https://issues.apache.org/jira/browse/HUDI-2242 [37] https://issues.apache.org/jira/browse/HUDI-2804 [38] https://issues.apache.org/jira/browse/HUDI-2392 [39] https://issues.apache.org/jira/browse/HUDI-1932 ====================================== Tests [Tests] Enabling metadata table in TestHoodieIndex and TestMergeOnReadRollbackActionExecutor [1] [Tests]Enabling metadata table for TestHoodieMergeOnReadTable and TestHoodieCompactor [2] [1] https://issues.apache.org/jira/browse/HUDI-2472 [2] https://issues.apache.org/jira/browse/HUDI-2472 Best, Leesf