Dear community, Nice to share Hudi community bi-weekly updates for 2021-06-06 ~ 2021-06-20 with updates on features, bug fixes and tests.
======================================= Features [CLI] Add fetching latest schema to table command in hudi-cli [1] [Spark Integration] Added support for SqlFileBasedTransformer [2] [Flink Integration] add BootstrapFunction to support index bootstrap [3] [Spark Integration] Basic Implement Of Spark Sql Support For Hoodie [4] [Core] Support configure KeyGenerator by type [5] [Spark Integration] Added SqlSource to fetch data from any partitions for backfill use case [6] [Flink Integration] Support independent flink hudi compaction function [7] [Core] ORC reader writer Implementation [8] [Flink Integration] Support flink hive sync in batch mode [9] [Flink Integration] Add metadata cache to WriteProfile to reduce IO [10] [Flink Integration] Make flink writer as exactly-once by default [11] [Deltasteramer] Adds JDBC source support for DeltaStreamer [12] [1] https://issues.apache.org/jira/browse/HUDI-1914 [2] https://issues.apache.org/jira/browse/HUDI-1743 [3] https://issues.apache.org/jira/browse/HUDI-1924 [4] https://issues.apache.org/jira/browse/HUDI-1659 [5] https://issues.apache.org/jira/browse/HUDI-1929 [6] https://issues.apache.org/jira/browse/HUDI-1790 [7] https://issues.apache.org/jira/browse/HUDI-1984 [8] https://issues.apache.org/jira/browse/HUDI-765 [9] https://issues.apache.org/jira/browse/HUDI-2014 [10] https://issues.apache.org/jira/browse/HUDI-2030 [11] https://issues.apache.org/jira/browse/HUDI-2040 [12] https://issues.apache.org/jira/browse/HUDI-251 ======================================= Bugs [Spark Integration] Add Default value for HIVE_AUTO_CREATE_DATABASE_OPT_KEY in HoodieSparkSqlWriter [1] [Flink Integration] BucketAssignFunction use ValueState instead of MapState [2] [Flink Integration] Skip Commits with empty files [3] [Core] Fix NPE when avro field value is null [4] [Flink Integration] Skip creating marker files for flink merge handle [5] [Flink Integration] Fix non partition table hive meta sync for flink writer [6] [Flink Integration] Release the new records map for merge handle #close [7] [Flink Integration] Release the new records iterator for append handle #close [8] [Flink Integratoin] Release file writer for merge handle #close [9] [Spark Integration] Fixing drop dups exception in bulk insert row writer path [10] [Flink Integration] Refresh the base file view cache for WriteProfile [11] [Flink Integration] Release writer for append handle #close [12] [Code Cleanup] Avoid the raw type usage in some classes under hudi-utilities module [13] [Core] Fix the filter condition is missing in the judgment condition of compaction instance [14] [Flink Integration] Fix flink operator uid to allow multiple pipelines in one job [15] [Spaark Integration] Fix RO Tables Returning Snapshot Result [16] [Spark Integration] Set up the file system view storage config for singleton embedded server write config every time [17] [Flink Integration] Make keygen class and keygen type optional for FlinkStreamerConfig [18] [Spark Integration] ClassCastException Throw When PreCombineField Is String Type [19] [Flink Integration] Move the compaction plan scheduling out of flink writer coordinator [20] [1] https://issues.apache.org/jira/browse/HUDI-1942 [2] https://issues.apache.org/jira/browse/HUDI-1931 [3] https://issues.apache.org/jira/browse/HUDI-1909 [4] https://issues.apache.org/jira/browse/HUDI-1895 [5] https://issues.apache.org/jira/browse/HUDI-1723 [6] https://issues.apache.org/jira/browse/HUDI-1987 [7] https://issues.apache.org/jira/browse/HUDI-1992 [8] https://issues.apache.org/jira/browse/HUDI-1994 [9] https://issues.apache.org/jira/browse/HUDI-2000 [10] https://issues.apache.org/jira/browse/HUDI-1991 [11] https://issues.apache.org/jira/browse/HUDI-1999 [12] https://issues.apache.org/jira/browse/HUDI-2022 [13] https://issues.apache.org/jira/browse/HUDI-2008 [14] https://issues.apache.org/jira/browse/HUDI-1955 [15] https://issues.apache.org/jira/browse/HUDI-2015 [16] https://issues.apache.org/jira/browse/HUDI-1879 [17] https://issues.apache.org/jira/browse/HUDI-2019 [18] https://issues.apache.org/jira/browse/HUDI-2032 [19] https://issues.apache.org/jira/browse/HUDI-2033 [20] https://issues.apache.org/jira/browse/HUDI-2036 ====================================== Tests [Tests] Move TestHiveMetastoreBasedLockProvider to functional [1] [Tests] Move CheckpointUtils test cases to independant class [2] [Tests] Fix Azure CI failure in TestParquetUtils [3] [1] https://issues.apache.org/jira/browse/HUDI-1950 [2] https://issues.apache.org/jira/browse/HUDI-2004 [3] https://issues.apache.org/jira/browse/HUDI-1950 Best, Leesf