Dear community, Nice to share Hudi community bi-weekly updates for 2021-07-18 ~ 2021-08-01 with updates on features, bug fixes and tests.
======================================= Features [Core] Adding support to disable meta columns with bulk insert operation [1] [DeltaStreamer] Add INSERT_OVERWRITE and INSERT_OVERWRITE_TABLE support to DeltaStreamer [2] [Spark Integration] MergeInto Support Partial Update For COW [3] [Hive Integration] DeltaStreamer kafka source supports consuming from specified timestamp [4] [Hive Integration] Adding support for HMS for running DDL queries in hive-sync [5] [Docs] Automate the generation of configs webpage as configs are added to Hudi repo [6] [Core] Adding virtual key support to COW table [7] [Flink Integration] Add rateLimiter when Flink writes to hudi [8] [Core] Integrate consumers with rocksDB and compression within External Spillable Map [9] [Flink Integration] Add option 'hive_sync.mode' for flink writer [10] [Spark Integration] Explicit parallelism for flink bulk insert [11] [Hive Integration] Support setting hive sync partition extractor class based on flink configuration [12] [1] https://issues.apache.org/jira/browse/HUDI-2161 [2] https://issues.apache.org/jira/browse/HUDI-1860 [3] https://issues.apache.org/jira/browse/HUDI-1884 [4] https://issues.apache.org/jira/browse/HUDI-1447 [5] https://issues.apache.org/jira/browse/HUDI-1848 [6] https://issues.apache.org/jira/browse/HUDI-1241 [7] https://issues.apache.org/jira/browse/HUDI-2176 [8] https://issues.apache.org/jira/browse/HUDI-2215 [9] https://issues.apache.org/jira/browse/HUDI-2044 [10] https://issues.apache.org/jira/browse/HUDI-2228 [11] https://issues.apache.org/jira/browse/HUDI-2241 [12] https://issues.apache.org/jira/browse/HUDI-2184 ======================================= Bugs [Flink Integration] Remove state in BootstrapFunction [1] [Flink Integration] Create new bucket when NewFileAssignState filled[2] [Flink Integration] Clean and reset the bootstrap events for coordinator when task failover [3] [Code Cleanup] Clean up Multiple versions of scala libraries detected Warning [4] [Flink Integraion] Add marker files for flink writer [5] [Spark Integration] Sync Hive Failed When Execute CTAS In Spark2 And Spark3 [6] [Core] Fix checkpoint blocked because getLastPendingInstant() action after than restoreWriteMetadata() action [7] [Flink Integration] Rollback inflight compaction for flink writer [8] [Spark Integration] MergeInto MOR Table May Result InCorrect Result [9] [Spark Integration] Missing PrimaryKey In Hoodie Properties For CTAS Table [10] [Core] residual temporary files after clustering are not cleaned up [11] [Core] Fix NPE of HoodieConfig [12] [Core] Fix no value present in incremental query on MOR [13] [Spark Integration] Fix Alter Partitioned Table Failed [14] [Flink Integration] Only sync hive meta on successful commit for flink batch writer [15] [Core] Make codahale times transient to avoid serializable exceptions [16] [Core]] BucketAssigner generates the fileId evenly to avoid data skew [17] [Hive Integration] Fix database alreadyExists exception while hive sync [18] [Spark Integration] Performance loss with the additional hoodieRecords.isEmpty() in HoodieSparkSqlWriter#write [19] [Spark Integration] Unpersist the input rdd after the commit is completed to save the memory space for inline compaction [20] [Spark Integration] Fix Exception Cause By Table Name Case Sensitivity For Append Mode Write [21] [Flink Integration] Default consumes from the latest instant for flink streaming reader [22] [Flink Integration] Builtin sort operator for flink bulk insert [23] [Core] Fix missing HoodieWriteStat in HoodieCreateHandle [24] [1] https://issues.apache.org/jira/browse/HUDI-2193 [2] https://issues.apache.org/jira/browse/HUDI-2145 [3] https://issues.apache.org/jira/browse/HUDI-2198 [4] https://issues.apache.org/jira/browse/HUDI-2192 [5] https://issues.apache.org/jira/browse/HUDI-2204 [6] https://issues.apache.org/jira/browse/HUDI-2195 [7] https://issues.apache.org/jira/browse/HUDI-2206 [8] https://issues.apache.org/jira/browse/HUDI-2205 [9] https://issues.apache.org/jira/browse/HUDI-2139 [10] https://issues.apache.org/jira/browse/HUDI-2212 [11] https://issues.apache.org/jira/browse/HUDI-2214 [12] https://issues.apache.org/jira/browse/HUDI-2219 [13] https://issues.apache.org/jira/browse/HUDI-2217 [14] https://issues.apache.org/jira/browse/HUDI-2223 [15] https://issues.apache.org/jira/browse/HUDI-2227 [16] https://issues.apache.org/jira/browse/HUDI-2240 [17] https://issues.apache.org/jira/browse/HUDI-2245 [18] https://issues.apache.org/jira/browse/HUDI-2244 [19] https://issues.apache.org/jira/browse/HUDI-1425 [20] https://issues.apache.org/jira/browse/HUDI-2117 [21] https://issues.apache.org/jira/browse/HUDI-2251 [22] https://issues.apache.org/jira/browse/HUDI-2252 [23] https://issues.apache.org/jira/browse/HUDI-2254 [24] https://issues.apache.org/jira/browse/HUDI-2218 ====================================== Tests [Tests] Fixing hudi_test_suite for spark nodes and adding spark bulk_insert node [1] [Tests] Fix NullPointerException in TestHoodieConsoleMetrics [2] [Tests] Refactoring few tests to reduce runningtime. DeltaStreamer and MultiDeltaStreamer tests. Bulk insert row writer tests [3] [1] https://issues.apache.org/jira/browse/HUDI-2007 [2] https://issues.apache.org/jira/browse/HUDI-2211 [3] https://issues.apache.org/jira/browse/HUDI-2253 Best, Leesf