Dear community, Nice to share Hudi community bi-weekly updates for 2021-06-20 ~ 2021-07-04 with updates on features, bug fixes and tests.
======================================= Features [Spark Integration] Support AlterCommand For Hoodie [1] [Spark Integration] Support Truncate Table For Hoodie [2] [Utilities] Add ORC support in HoodieSnapshotExporter [3] [Hive Integration] Add ability to provide multi-region (global) data consistency across HMS in different regions [Deltastreamer] Commit Offset to Kafka after successful Hudi commit [5] [Flink Integration] Supports hive style partitioning for flink writer [6] [Flink Integration] Support specify compaction paralleism and compaction target io for flink batch compaction [7] [Deltastreamer] Support Hudi to read from committed offset [8] [Flink Integration] Support load logFile in BootstrapFunction [9] [Core] Add configOption & refactor all configs based on that [10] [Spark Integration] Enable Hive Sync When Spark Enable Hive Meta For Spark Sql [11] [Flink Integration] Support reading pure logs file group for flink batch reader after compaction [12] [Flink Integration] Add operator uid for flink stateful operators [13] [Utilities] A Grafana dashboard for HUDI [14] [Core] Bootstrap support configure KeyGenerator by type [15] [1] https://issues.apache.org/jira/browse/HUDI-1914 [2] https://issues.apache.org/jira/browse/HUDI-1883 [3] https://issues.apache.org/jira/browse/HUDI-1826 [5] https://issues.apache.org/jira/browse/HUDI-2094 [6] https://issues.apache.org/jira/browse/HUDI-1790 [7] https://issues.apache.org/jira/browse/HUDI-2085 [8] https://issues.apache.org/jira/browse/HUDI-1944 [9] https://issues.apache.org/jira/browse/HUDI-2052 [10] https://issues.apache.org/jira/browse/HUDI-89 [11] https://issues.apache.org/jira/browse/HUDI-2051 [12] https://issues.apache.org/jira/browse/HUDI-2112 [13] https://issues.apache.org/jira/browse/HUDI-2121 [14] https://issues.apache.org/jira/browse/HUDI-2124 [15] https://issues.apache.org/jira/browse/HUDI-1930 ======================================= Bugs [Flink Integration] StreamWriteFunction should wait for the next inflight instant time before flushing [1] [Flink Integration] Support rollback inflight compaction instances for batch flink compactor [2] [Core] HoodieDefaultTimeline$filterPendingCompactionTImeline() method have wrong filter condition [3] [Core] JVM occasionally crashes during compaction when spark speculative execution is enabled [4] [Flink Integration] Ignore FileNotFoundException in WriteProfiles #getWritePathsOfInstant [5] [Core] Removed option to fallback to file listing when Metadata Table is enabled [6] [Core] Metadata Reader should merge all the un-synced but complete instants from the dataset timeline [7] [Core] FinalizeWrite() been executed twice in AbstractHoodieWriteClient$commitstats [8] [Flink Integratoin] Remove the duplicate name for flink write pipeline [9] [Flink Integration] Support rollback inflight compaction instances for CompactionPlanOperator [10] [Spark Integration] Incorrect Schema Inference For Schema Evolved Table [11] [Spark Integration] Insert Static Partition With DateType Return Incorrect Partition Value [12] [DeltaStreamer] Fix KafkaAvroSchemaDeserializer to not rely on reflection [13] [Flink Integration] Catch FileNotFoundException in WriteProfiles #getCommitMetadata Safely [14] [Core] Fix the bug of hoodieClusteringJob never quit [15] [Flink Integration] Use while loop instead of recursive call in MergeOnReadInputFormat#MergeIterator to avoid StackOverflow [16] [Flink Integration] Sync FlinkOptions config to FlinkStreamerConfig [17] [Flink Integration] Resend the uncommitted write metadata when start up [18] [Spark Integration] Fix Flink unable to read commit metadata error [19] [Flink Integration] Fix NPE caused by FlinkStreamerConfig#writePartitionUrlEncode null value [20] [Flink Integration] Add rebalance before index bootstrap [21] [Flink Integration] Missing Partition Fields And PreCombineField In Hoodie Properties For Table Written By Flink [22] [Spark Integration] Compaction Failed For MergeInto MOR Table [23] [Spark Integration] Spark Query MOR Table Written By Flink Return Incorrect Timestamp Value [24] [Spark Integration] Exception When Merge With Null-Value Field [25] [Spark Integration] CTAS Generate An External Table When Create Managed Table [26] [Hive Integration] Support batch synchronization of partition datas to hive metastore to avoid oom problem [27] [1] https://issues.apache.org/jira/browse/HUDI-2049 [2] https://issues.apache.org/jira/browse/HUDI-2050 [3] https://issues.apache.org/jira/browse/HUDI-1909 [4] https://issues.apache.org/jira/browse/HUDI-2031 [5] https://issues.apache.org/jira/browse/HUDI-2047 [6] https://issues.apache.org/jira/browse/HUDI-2013 [7] https://issues.apache.org/jira/browse/HUDI-1717 [8] https://issues.apache.org/jira/browse/HUDI-1988 [9] https://issues.apache.org/jira/browse/HUDI-2054 [10] https://issues.apache.org/jira/browse/HUDI-2038 [11] https://issues.apache.org/jira/browse/HUDI-2061 [12] https://issues.apache.org/jira/browse/HUDI-2053 [13] https://issues.apache.org/jira/browse/HUDI-2069 [14] https://issues.apache.org/jira/browse/HUDI-2062 [15] https://issues.apache.org/jira/browse/HUDI-2073 [16] https://issues.apache.org/jira/browse/HUDI-2074 [17] https://issues.apache.org/jira/browse/HUDI-2067 [18] https://issues.apache.org/jira/browse/HUDI-2084 [19] https://issues.apache.org/jira/browse/HUDI-2097 [20] https://issues.apache.org/jira/browse/HUDI-2092 [21] https://issues.apache.org/jira/browse/HUDI-2103 [22] https://issues.apache.org/jira/browse/HUDI-2088 [23] https://issues.apache.org/jira/browse/HUDI-2105 [24] https://issues.apache.org/jira/browse/HUDI-2114 [25] https://issues.apache.org/jira/browse/HUDI-2123 [26] https://issues.apache.org/jira/browse/HUDI-2057 [27] https://issues.apache.org/jira/browse/HUDI-2116 ====================================== Tests [Tests] Increase timeout for deltaStreamerTestRunner in TestHoodieDeltaStreamer [1] [Tests] Fix TestHoodieBackedMetadata#testOnlyValidPartitionsAdded [2] [Tests] Added tests for KafkaOffsetGen [3] [Tests] Move schema util tests out from TestHiveSyncTool [4] [Tests] Adding more yaml templates to test suite [5] [1] https://issues.apache.org/jira/browse/HUDI-1248 [2] https://issues.apache.org/jira/browse/HUDI-2064 [3] https://issues.apache.org/jira/browse/HUDI-2060 [4] https://issues.apache.org/jira/browse/HUDI-2081 [5] https://issues.apache.org/jira/browse/HUDI-2006 Best, Leesf