Dear community, Nice to share Hudi community bi-weekly updates for 2021-04-25 ~ 2021-05-09 with updates on features, bug fixes and tests.
======================================= Features [Flink Integration] Add option to flush when total buckets memory exceeds the threshold [1] [Core] Add optional instant range to log record scanner for log [2] [Deltastreamer] Improve table level config priority for HoodieMultiTableDeltaStreamer [3] [Flink Integration] Tweak the min max commits to keep when setting up cleaning retain commits for Flink [4] [Flink Integration] Logging consuming instant to StreamReadOperator#processSplits [5] [Spark Integration] use jsc union instead of rdd union [6] [Flink Integration] Add rate limiter to Flink writer to avoid OOM for bootstrap [7] [Flink Integration] Streaming read for Flink COW table [8] [Deltastreamer] Add SCHEMA_REGISTRY_SOURCE_URL_SUFFIX and SCHEMA_REGISTRY_TARGET_URL_SUFFIX property [9] [Flink Integration] Remove legacy code for Flink writer [10] [Flink Integration] Support streaming read with compaction and cleaning [11] [Flink Integration] Add max memory option for flink writer task [12] [1] https://issues.apache.org/jira/browse/HUDI-1844 [2] https://issues.apache.org/jira/browse/HUDI-1837 [3] https://issues.apache.org/jira/browse/HUDI-1742 [4] https://issues.apache.org/jira/browse/HUDI-1841 [5] https://issues.apache.org/jira/browse/HUDI-1836 [6] https://issues.apache.org/jira/browse/HUDI-1690 [7] https://issues.apache.org/jira/browse/HUDI-1863 [8] https://issues.apache.org/jira/browse/HUDI-1867 [9] https://issues.apache.org/jira/browse/HUDI-1852 [10] https://issues.apache.org/jira/browse/HUDI-1821 [11] https://issues.apache.org/jira/browse/HUDI-1880 [12] https://issues.apache.org/jira/browse/HUDI-1878 ======================================= Bugs [Core] Fixing kafka native config param for auto offset reset [1] [Core] rollback pending clustering even if there is greater commit [2] [Flink Integration] Fix cannot create table due to jar conflict [3] [Hive Integration] Exception Throws When Sync Non-Partitioned Table To Hive With MultiPartKeysValueExtractor [4] [Spark Integration] Fix getting incorrect partition path while using incr query by spark-sql [5] [Flink Integration] Fix Flink streaming reader throws ClassCastException [6] [Flink Integration] When query incr view of mor table which has Multi level partitions, the query failed [7] [Core] wiring in Hadoop Conf with AvroSchemaConverters instantiation [8] [Hive Integratoin] Save one connection retry to hive metastore when hiveSyncTool run with useJdbc=false [9] [1] https://issues.apache.org/jira/browse/HUDI-1835 [2] https://issues.apache.org/jira/browse/HUDI-1833 [3] https://issues.apache.org/jira/browse/HUDI-1858 [4] https://issues.apache.org/jira/browse/HUDI-1798 [5] https://issues.apache.org/jira/browse/HUDI-1801 [6] https://issues.apache.org/jira/browse/HUDI-1781 [7] https://issues.apache.org/jira/browse/HUDI-1718 [8] https://issues.apache.org/jira/browse/HUDI-1876 [9] https://issues.apache.org/jira/browse/HUDI-1759 ====================================== Tests [Tests] Fix TestHoodieRealtimeRecordReader [1] [Tests] Fix azure setting for integ tests [2] [Tests] Fix Metrics UT [3] [1] https://issues.apache.org/jira/browse/HUDI-1811 [2] https://issues.apache.org/jira/browse/HUDI-1810 [3] https://issues.apache.org/jira/browse/HUDI-1620 Best, Leesf