Dear community, Nice to share Hudi community bi-weekly updates for 2022-01-16 ~ 2022-02-13 with updates on features, bug fixes.
======================================= Features [Spark] Struct Stream Source Support Spark3 [1] [Spark] ] Add support allowDuplicateInserts in HoodieJavaClient [2] [Core] Adding support for Parquet in MOR tables Log blocks [3] [Core] New ScheduleAndExecute mode for HoodieCompactor and hudi-cli [4] [Flink] Bump flink version to 1.14.3 [5] [Spark] Adding inline scheduling support for spark datasource path for compaction and clustering [6] [Core] Adding restore.requested instant and restore plan for restore action [7] [1] https://issues.apache.org/jira/browse/HUDI-1558 [2] https://issues.apache.org/jira/browse/HUDI-2417 [3] https://issues.apache.org/jira/browse/HUDI-431 [4] https://issues.apache.org/jira/browse/HUDI-3369 [5] https://issues.apache.org/jira/browse/HUDI-3389 [6] https://issues.apache.org/jira/browse/HUDI-1847 [7] https://issues.apache.org/jira/browse/HUDI-2432 ======================================= Bugs [Core] Extracted common AbstractHoodieTableFileIndex to be shared across engines [1] [Core] Excluding clustering instants from pending rollback info [2] [Core] fix MOR snapshot query during compaction [3] [Core] Avoid creating empty requestedReplaceCommit in the startCommit method [4] [Core] A] Read rt table by hive cli throw NoSuchMethodError [5] [Core] Do not nullify members in HoodieTableFileSystemView#resetViewState to avoid NPE [6] [Core] get table schema from the last commit with data written [7] [Core] Convert uppercase letters to lowercase in storage configs [8] [Core] Rebasing Hive's FileInputFormat onto `AbstractHoodieTableFileIndex` [9] [Core] Filter non-parquet files in bootstrap procedure [10] [Core] use fields'comments persisted in catalog to fill in schema [11] [Core] Bootstrap support overwrite existing table [12] [Core] Drop unused method SparkBootstrapCommitActionExecutor#handleMetadataBootstrap [13] [Core] Tuning performance of getAllPartitionPaths API in FileSystemBackedTableMetadata [14] [Core] Fix NPE while reading table with Spark datasource [15] [Core] Add support for using database name in incremental query [16] [Spark] Fixing read of a empty table but with failed write [17] [Spark] Fix delete exception for Spark SQL when sync Hive [18] [Core] Fixing conflict resolution in transaction management codepath for auto commit code path [19] [Core] Refactoring layout optimization (clustering) flow to support linear ordering [20] [Core] gracefully fail to change column data type [21] [Core] Rewriting rfc-27 for data skipping index [22] [Core] Metadata table records - support for key deduplication based on hardcoded key field [23] [Core] Make class names consistent in hudi-client [24] [Core] [RFC-40] A new Hudi connector for Trino [25] [Core] Complete pending clustering before deltastreamer sync [26] [Core] Fix Hudi CLI tempview query issue [27] [Core] preferred to use the table's own location [28] [Core] [RFC-46] Optimize Record Payload handling [29] [Core] Enabling lazy read by default for log blocks during compaction [30] [Core] Fallback to fulltable scan for IncrementalRelation if underlying files have been cleared or moved by cleaner [31] [Core] UFixing non existant marker dir handling in TwoToOnedowngrade [32] [Core] Fixing default value for clustering small file config to 300MB [33] [Core] RFC-37: Metadata table based bloom index [34] [Core] Fixing Metadata Table Records Duplication Issues [35] [Core] Fixing Parquet Column Range metadata extraction [36] [Core] Metadata Index - Bloom filter and Column stats index to speed up index lookups [37] [Core] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s [38] [Core] Generalize HoodieIndex for flexible record data type [39] [Core] Expose HMS mode metastore uri config option for spark writer [40] [Deltastreamer] Adding retries to deltastreamer for source errors [41] [Core] Show _hoodie_operation in spark sql results [42] [Core] Unify Hive's MOR implementations to avoid duplication [43] [Core] Simplify Precommit file system view [44] [Core] Add zero value metrics for empty data source and PROMETHEUS_PUSHGATEWAY reporter [45] [Core] Hoodie metadata table validator [46] [Core] Making SIMPLE index as the default index type [47] [Core] Fixing missing begin checkpoint in HoodieIncremental pull [48] [Core] Rebased Parquet-based FileInputFormat impls to inherit from `MapredParquetInputFormat` [49] [Core] Converting BaseHoodieTableFileIndex to Java [50] [Core] fix that getNestedFieldVal breaks with Spark 3.2 [51] [CLI] Allow pass rollbackUsingMarkers to Hudi CLI rollback command [52] [Spark] pass the spark version when sync the table created by spark [53] [Core] Update all deprecated calls to new apis in HoodieRecordPayload [54] [Core] Set TIMESTAMP_MICROS as the default value for hoodie.parquet.outputtimestamptype [55] [Core] Custom relation instead of HadoopFsRelation [56] [Core] Fix restore to rollback pending clustering operations followed by other rolling back other commits [57] [Deltastreamer] fix jackson parse error when empty message from JsonKafkaSource Using HoodieDeltaStreamer [58] [1] https://issues.apache.org/jira/browse/HUDI-3179 [2] https://issues.apache.org/jira/browse/HUDI-3257 [3] https://issues.apache.org/jira/browse/HUDI-3194 [4] https://issues.apache.org/jira/browse/HUDI-3252 [5] https://issues.apache.org/jira/browse/HUDI-3261 [6] https://issues.apache.org/jira/browse/HUDI-3263 [7] https://issues.apache.org/jira/browse/HUDI-2903 [8] https://issues.apache.org/jira/browse/HUDI-3245 [9] https://issues.apache.org/jira/browse/HUDI-3191 [10] https://issues.apache.org/jira/browse/HUDI-3277 [11] https://issues.apache.org/jira/browse/HUDI-3236 [12] https://issues.apache.org/jira/browse/HUDI-3283 [13] https://issues.apache.org/jira/browse/HUDI-3285 [14] https://issues.apache.org/jira/browse/HUDI-3281 [15] https://issues.apache.org/jira/browse/HUDI-3268 [16] https://issues.apache.org/jira/browse/HUDI-2837 [17] https://issues.apache.org/jira/browse/HUDI-1850 [18] https://issues.apache.org/jira/browse/HUDI-3282 [19] https://issues.apache.org/jira/browse/HUDI-3072 [20] https://issues.apache.org/jira/browse/HUDI-2872 [21] https://issues.apache.org/jira/browse/HUDI-3237 [22] https://issues.apache.org/jira/browse/HUDI-1822 [23] https://issues.apache.org/jira/browse/HUDI-2763 [24] https://issues.apache.org/jira/browse/HUDI-2596 [25] https://issues.apache.org/jira/browse/HUDI-2688 [26] https://issues.apache.org/jira/browse/HUDI-2943 [27] https://issues.apache.org/jira/browse/HUDI-1977 [28] https://issues.apache.org/jira/browse/HUDI-3253 [29] https://issues.apache.org/jira/browse/HUDI-3318 [30] https://issues.apache.org/jira/browse/HUDI-3292 [31] https://issues.apache.org/jira/browse/HUDI-2711 [32] https://issues.apache.org/jira/browse/HUDI-3346 [33] https://issues.apache.org/jira/browse/HUDI-3293 [34] https://issues.apache.org/jira/browse/HUDI-2589 [35] https://issues.apache.org/jira/browse/HUDI-3322 [36] https://issues.apache.org/jira/browse/HUDI-3337 [37] https://issues.apache.org/jira/browse/HUDI-1295 [38] https://issues.apache.org/jira/browse/HUDI-3191 [39] https://issues.apache.org/jira/browse/HUDI-2656 [40] https://issues.apache.org/jira/browse/HUDI-2491 [41] https://issues.apache.org/jira/browse/HUDI-3360 [42] https://issues.apache.org/jira/browse/HUDI-2941 [43] https://issues.apache.org/jira/browse/HUDI-3206 [44] https://issues.apache.org/jira/browse/HUDI-3058 [45] https://issues.apache.org/jira/browse/HUDI-3373 [46] https://issues.apache.org/jira/browse/HUDI-3320 [47] https://issues.apache.org/jira/browse/HUDI-3091 [48] https://issues.apache.org/jira/browse/HUDI-3361 [49] https://issues.apache.org/jira/browse/HUDI-3276 [50] https://issues.apache.org/jira/browse/HUDI-3239 [51] https://issues.apache.org/jira/browse/HUDI-3333 [52] https://issues.apache.org/jira/browse/HUDI-3395 [53] https://issues.apache.org/jira/browse/HUDI-2610 [54] https://issues.apache.org/jira/browse/HUDI-2987 [55] https://issues.apache.org/jira/browse/HUDI-3402 [56] https://issues.apache.org/jira/browse/HUDI-3338 [57] https://issues.apache.org/jira/browse/HUDI-3362 [58] https://issues.apache.org/jira/browse/HUDI-3413 =================================== Tests [Tests] add UT for update/delete on non-pk condition [1] [Tests] Fixing utilities and integ test suite bundle to include hudi spark datasource [2] [Tests] Solve UT for Spark 3.2 [3] [Tests] Remove fixture test tables for multi writer tests [4] [Tests] Fixing spark yaml and adding hive validation to integ test suite [5] [1] https://issues.apache.org/jira/browse/HUDI-2968 [2] https://issues.apache.org/jira/browse/HUDI-3262 [3] https://issues.apache.org/jira/browse/HUDI-3215 [4] https://issues.apache.org/jira/browse/HUDI-3330 [5] https://issues.apache.org/jira/browse/HUDI-3312 Best, Leesf