Dear community, Nice to share Hudi community bi-weekly updates for 2021-08-01 ~ 2021-08-15 with updates on features, bug fixes and tests.
======================================= Features [Examples] Add a compaction job in hudi-examples [1] [Core] Add pre-commit validator framework [2] [Spark Integration] Support metadata based listing for Spark DataSource and Spark SQL [3] [Hive Integration] Metadata table for flink [4] [Spark Integration] Use HMS To Sync Hive Meta For Spark Sql [5] [Flink Integration] Allows INSERT duplicates for Flink MOR table [6] [Flink Integration] Use INT64 timestamp with precision 3 for flink parquet writer [7] [Spark Integration] Support Compaction Command For Spark Sql [8] [Core] Support custom clustering strategies and preserve commit metadata as part of clustering [9] [Flink Integration] Spark Sql Support For pre-existing Hoodie Table [10] [Spark Integration] Support Time Travel Query For Hoodie Table [11] [Spark Integration] Support Bulk Insert For Spark Sql [12] [Spark Integration] Skip the latest N partitions when choosing partitions to create ClusteringPlan [13] [Flink Integration] Propagate CDC format for hoodie [14] [Core] Support storage on ks3 for hudi [15] [DeltaStreamer] Adding support for delete_partitions to spark data source [16] [Core] Add timeline-server-based marker file strategy for improving marker-related latency [17] [Core] Add API to set a metric in the registry [18] [Core] Adding virtual keys support to deltastreamer [19] [Spark Integration] Support column name matching for insert * and update set * in merge into [20] [Core] Provide option to drop partition columns [21] [Deltastreamer] Deltastreamer source for AWS S3 [22] [Core] Add upgrade and downgrade to and from 0.9.0 [23] [1] https://issues.apache.org/jira/browse/HUDI-2225 [2] https://issues.apache.org/jira/browse/HUDI-2072 [3] https://issues.apache.org/jira/browse/HUDI-1893 [4] https://issues.apache.org/jira/browse/HUDI-2258 [5] https://issues.apache.org/jira/browse/HUDI-2233 [6] https://issues.apache.org/jira/browse/HUDI-2274 [7] https://issues.apache.org/jira/browse/HUDI-2278 [8] https://issues.apache.org/jira/browse/HUDI-2182 [9] https://issues.apache.org/jira/browse/HUDI-1468 [10] https://issues.apache.org/jira/browse/HUDI-1842 [11] https://issues.apache.org/jira/browse/HUDI-2243 [12] https://issues.apache.org/jira/browse/HUDI-2208 [13] https://issues.apache.org/jira/browse/HUDI-2194 [14] https://issues.apache.org/jira/browse/HUDI-1771 [15] https://issues.apache.org/jira/browse/HUDI-2288 [16] https://issues.apache.org/jira/browse/HUDI-1774 [17] https://issues.apache.org/jira/browse/HUDI-1138 [18] https://issues.apache.org/jira/browse/HUDI-2017 [19] https://issues.apache.org/jira/browse/HUDI-2294 [20] https://issues.apache.org/jira/browse/HUDI-2279 [21] https://issues.apache.org/jira/browse/HUDI-1363 [22] https://issues.apache.org/jira/browse/HUDI-1897 [23] https://issues.apache.org/jira/browse/HUDI-2268 ======================================= Bugs [Flink Integration] Release the disk map resource for flink streaming reader [1] [Hive Integration] Pass base file format to sync clients [2] [Spark Integration] Refactor Datasource options [3] [Core] Ensure Disk Maps create a subfolder with appropriate prefixes and cleans them up on close [4] [Spark Integraion] MERGE INTO fails with table having nested struct [5] [Flink Integration] Filter file where length less than parquet MAGIC length [6] [Core] Improving schema evolution support in hudi [7] [Flink Integration] Compare the field object directly in OverwriteWithLatestAvroPayload [8] [Spark Integration] Always choose the latest record for HoodieRecordPayload [9] [Spark Integration] remove joda time in hivesync module [10] [Core] MOR should not predicate pushdown when reading with payload_combine type [11] [Core] Handle the case of failed deltacommit on the metadata table. [12] [Core] The HoodieMergedLogRecordScanner should set up the operation of the chosen record [13] [Core] Remove the logic that delete replaced file when archive [14] [Core] Created a config to enable/disable syncing of metadata table [15] [Core] Flipping defaults [16] [Core] Ensure the rolled-back instance was previously synced to the Metadata Table when syncing a Rollback Instant [17] [Core] When using delete_partition with ds should not rely on the primary key [18] [Core] Add MARKERS.type and fix marker-based rollback [19] [1] https://issues.apache.org/jira/browse/HUDI-2269 [2] https://issues.apache.org/jira/browse/HUDI-2272 [3] https://issues.apache.org/jira/browse/HUDI-2255 [4] https://issues.apache.org/jira/browse/HUDI-2090 [5] https://issues.apache.org/jira/browse/HUDI-2232 [6] https://issues.apache.org/jira/browse/HUDI-2247 [7] https://issues.apache.org/jira/browse/HUDI-1129 [8] https://issues.apache.org/jira/browse/HUDI-2042 [9] https://issues.apache.org/jira/browse/HUDI-1763 [10] https://issues.apache.org/jira/browse/HUDI-1939 [11] https://issues.apache.org/jira/browse/HUDI-2292 [12] https://issues.apache.org/jira/browse/HUDI-2286 [13] https://issues.apache.org/jira/browse/HUDI-2298 [14] https://issues.apache.org/jira/browse/HUDI-1518 [15] https://issues.apache.org/jira/browse/HUDI-1292 [16] https://issues.apache.org/jira/browse/HUDI-2151 [17] https://issues.apache.org/jira/browse/HUDI-2119 [18] https://issues.apache.org/jira/browse/HUDI-2307 [19] https://issues.apache.org/jira/browse/HUDI-2305 ====================================== Tests [Tests] Migrating some long running tests to functional test profile [1] [1] https://issues.apache.org/jira/browse/HUDI-2273 Best, Leesf