Re: Welcome our PMC Member, Raymond Xu
Hearty Congratulations Raymond !! Balaji.V On Sunday, July 18, 2021, 12:48:25 AM PDT, Dianjin Wang wrote: Congratulations! Best, Dianjin Wang On Sat, Jul 17, 2021 at 8:28 AM Vinoth Chandar wrote: > Folks, > > I am incredibly happy to share the addition of Raymond Xu to the Hudi PMC. > Raymond has been a valuable member of our community, over the past few > years now. Always hustlin and taking on the most underappreciated, but > extremely valuable aspects of the project, mostly recently with getting our > tests working smoothly on Azure CI! > > Please join me in congratulating Raymond! > > Onwards, > Vinoth >
[ANNOUNCE] Hudi Community Update(2021-07-04 ~ 2021-07-18)
Dear community, Nice to share Hudi community bi-weekly updates for 2021-07-04 ~ 2021-07-18 with updates on features, bug fixes and tests. === Features [Hive Integration] Support batch synchronization of partition datas to hive metastore to avoid oom problem [1] [Spark Integration] support incremental query for insert_overwrite_table/insert_overwrite operation on cow table [2] [Hive Integration] Support hive1 metadata sync for flink writer [3] [Core] Implement RockDbBasedMap as an alternate to DiskBasedMap in ExternalSpillableMap [4] [Flink Integration] Add compaction schedule option for flink [5] [Deltastreamer] Added deltastreamer metric for time of lastSync [6] [Core] Adding functionality to allow the providing of basic auth creds for confluent cloud schema registry [7] [Spark Integration] Adding support for UserDefinedPartitioners and SortModes to BulkInsert with Rows [8] [Spark Integration] Adding dedup support for Bulk Insert w/ Rows [9] [Flink Integration] Support Append only in Flink stream [10] [Spark Integration] Support async clustering for deltastreamer and Spark streaming [11] [Spark Integration] Support Read Hoodie As DataSource Table For Flink And DeltaStreamer [12] [Flink Integration] Support Read Log Only MOR Table For Spark [13] [Flink Integration] Add parallelism conf for bootstrap operator [14] [Core] Support reading logs for MOR Hive rt table [15] [Flink Integration] Support Transformer for HoodieFlinkStreamer [16] [Core] Implement compression for DiskBasedMap in Spillable Map [17] [Core] Make callback return HoodieWriteStat [18] [1] https://issues.apache.org/jira/browse/HUDI-2116 [2] https://issues.apache.org/jira/browse/HUDI-2058 [3] https://issues.apache.org/jira/browse/HUDI-2133 [4] https://issues.apache.org/jira/browse/HUDI-2028 [5] https://issues.apache.org/jira/browse/HUDI-2094 [6] https://issues.apache.org/jira/browse/HUDI-2055 [7] https://issues.apache.org/jira/browse/HUDI-1996 [8] https://issues.apache.org/jira/browse/HUDI-1104 [9] https://issues.apache.org/jira/browse/HUDI-1105 [10] https://issues.apache.org/jira/browse/HUDI-2087 [11] https://issues.apache.org/jira/browse/HUDI-1483 [12] https://issues.apache.org/jira/browse/HUDI-2045 [13] https://issues.apache.org/jira/browse/HUDI-2107 [14] https://issues.apache.org/jira/browse/HUDI-2171 [15] https://issues.apache.org/jira/browse/HUDI-1969 [16] https://issues.apache.org/jira/browse/HUDI-2165 [17] https://issues.apache.org/jira/browse/HUDI-2029 [18] https://issues.apache.org/jira/browse/HUDI-1633 === Bugs [Flink Integration] The coordinator send events to write function when there are no data for the checkpoint [1] [Core] Initialize the maxMemorySizeInBytes in log scanner [2] [Flink Integration] StreamerUtil.medianInstantTime should return a valid datetime string [3] [Spark Integration] Exception Throw Out When MergeInto With Decimal Type Field [4] [Core] Improvement in packaging insert into smallfiles [5] [Flink Integration] Make coordinator events as POJO for efficient serialization [6] [Flink Integraion] Fix flink batch compaction bug while user don't set compaction tasks [7] [Metadata Table] Ffix the bug that metatable cannot support non_partition table [8] [Core] Loaded too many classes like sun/reflect/GeneratedSerializationConstructorAccessor in JVM metaspace [9] [Flink Integration] Fix empty avro schema path caused by duplicate parameters [10] [Spark Integration] Incorrect Schema Inference For Schema Evolved Table [11] [Metadata Table] Fixed bootstrap of Metadata Table when some actions are in progress [12] [Core] FileSlices in the filegroup is not descending by timestamp [13] [Flink Integration] Refactored String constants [14] [Core] Add generics to avoif forced conversion in BaseSparkCommitActionExecutor#partition [15] [Spark Integration] Fixing extra commit metadata in row writer path [16] [Core] ]hive lock which state is WATING should be released, otherwise this hive lock will be locked forever [17] [Flink Integration] Fix conflict when flink-sql-connector-hive and hudi-flink-bundle are both in flink lib [18] [Spark Integration] Tweak the default compaction target IO to 500GB when flink async compaction is off [19] [Flink Integration] Support setting bucket assign parallelism for flink write task [20] [Flink Integration] Bug-Fix:Offline clustering(HoodieClusteringJob) will cause insert action losing data [21] [Flink Integration] Fix for AccessControlException for anonymous user [22] [Spark Integration] Fix Compile Error For Spark3 [23] [Spark Integration] Ensure and Audit docs for every configuration class in the codebase [24] [Spark Integration] Fix BucketAssignFunction Context NullPointerException [25] [Spark Integration] Remove the default parallelism of index bootstrap and bucket assigner [26] [1] https://issues.apache.org/jira/browse/HUDI-2126 [2] https://issues.apache.org/jira/browse/HUDI-2127 [3]
Re: Welcome New Committers: Pengzhiwei and DannyChan
Congratulations! Best, Dianjin Wang On Fri, Jul 16, 2021 at 6:38 PM leesf wrote: > Hi all, > > Please join me in congratulating our newest committers *Pengzhiwei *and > * DannyChan.* > > *Pengzhiwei *has been a consistent contributor to Hudi, he has > contributed numerous features to Hudi, such as Spark SQL integration with > Hudi, Spark Structured Streaming Source for Hudi and Spark FileIndex for > Hudi and also lots of other good contributions around Spark, and also very > active to answer users's questions. He is a solid team player and an asset > to the project. > > *DannyChan* has contributed many good features, such as new streaming > write pipeline for Flink with automatic compaction and cleaning (COW and > MOR), batch and streaming reader for Flink (COW and MOR) and support Flink > SQL connectors (reader and writer), he is actively join the ML and > answer users' questions as well as wrote a Hudi Flink integration guide and > launched a live show to promote Hudi Flink integration for Chinese users. > > Thanks so much for your continued contributions to make Hudi better and > better! > > Also I would like to introduce the current state of Hudi in China. Hudi > becomes more and more popular in China with the help of all community > members and has been adopted by almost all top companies in China, > including Alibaba, Baidu, ByteDance, Huawei, Tencent and other companies, > from startups to large companies, data scale from TB to PB. You would find > the logo wall below(PS: *unofficial statistics*, just listed some of them > and you can contact me to add your company logo if wanted). > > We would not achieve this without such a good community and the > contribution of all community members. Cheers and Go! > > [image: poweredby-0706.png] > > Thanks, > Leesf >
Re: Welcome our PMC Member, Raymond Xu
Congratulations! Best, Dianjin Wang On Sat, Jul 17, 2021 at 8:28 AM Vinoth Chandar wrote: > Folks, > > I am incredibly happy to share the addition of Raymond Xu to the Hudi PMC. > Raymond has been a valuable member of our community, over the past few > years now. Always hustlin and taking on the most underappreciated, but > extremely valuable aspects of the project, mostly recently with getting our > tests working smoothly on Azure CI! > > Please join me in congratulating Raymond! > > Onwards, > Vinoth >