Re: Welcome our PMC Member, Raymond Xu

2021-07-18 Thread vbal...@apache.org
 Hearty Congratulations Raymond !! 

Balaji.V

On Sunday, July 18, 2021, 12:48:25 AM PDT, Dianjin Wang 
 wrote:  
 
 Congratulations!

Best,
Dianjin Wang


On Sat, Jul 17, 2021 at 8:28 AM Vinoth Chandar  wrote:

> Folks,
>
> I am incredibly happy to share the addition of Raymond Xu to the Hudi PMC.
> Raymond has been a valuable member of our community, over the past few
> years now. Always hustlin and taking on the most underappreciated, but
> extremely valuable aspects of the project, mostly recently with getting our
> tests working smoothly on Azure CI!
>
> Please join me in congratulating Raymond!
>
> Onwards,
> Vinoth
>
  

[ANNOUNCE] Hudi Community Update(2021-07-04 ~ 2021-07-18)

2021-07-18 Thread leesf
Dear community,

Nice to share Hudi community bi-weekly updates for 2021-07-04 ~ 2021-07-18
with updates on features, bug fixes and tests.


===
Features

[Hive Integration] Support batch synchronization of partition datas to hive
metastore to avoid oom problem  [1]
[Spark Integration] support incremental query for
insert_overwrite_table/insert_overwrite operation on cow table [2]
[Hive Integration] Support hive1 metadata sync for flink writer [3]
[Core] Implement RockDbBasedMap as an alternate to DiskBasedMap in
ExternalSpillableMap [4]
[Flink Integration] Add compaction schedule option for flink [5]
[Deltastreamer] Added deltastreamer metric for time of lastSync [6]
[Core] Adding functionality to allow the providing of basic auth creds for
confluent cloud schema registry [7]
[Spark Integration] Adding support for UserDefinedPartitioners and
SortModes to BulkInsert with Rows [8]
[Spark Integration] Adding dedup support for Bulk Insert w/ Rows [9]
[Flink Integration] Support Append only in Flink stream [10]
[Spark Integration] Support async clustering for deltastreamer and Spark
streaming [11]
[Spark Integration] Support Read Hoodie As DataSource Table For Flink And
DeltaStreamer [12]
[Flink Integration] Support Read Log Only MOR Table For Spark [13]
[Flink Integration] Add parallelism conf for bootstrap operator [14]
[Core] Support reading logs for MOR Hive rt table [15]
[Flink Integration] Support Transformer for HoodieFlinkStreamer [16]
[Core] Implement compression for DiskBasedMap in Spillable Map [17]
[Core] Make callback return HoodieWriteStat [18]


[1] https://issues.apache.org/jira/browse/HUDI-2116
[2] https://issues.apache.org/jira/browse/HUDI-2058
[3] https://issues.apache.org/jira/browse/HUDI-2133
[4] https://issues.apache.org/jira/browse/HUDI-2028
[5] https://issues.apache.org/jira/browse/HUDI-2094
[6] https://issues.apache.org/jira/browse/HUDI-2055
[7] https://issues.apache.org/jira/browse/HUDI-1996
[8] https://issues.apache.org/jira/browse/HUDI-1104
[9] https://issues.apache.org/jira/browse/HUDI-1105
[10] https://issues.apache.org/jira/browse/HUDI-2087
[11] https://issues.apache.org/jira/browse/HUDI-1483
[12] https://issues.apache.org/jira/browse/HUDI-2045
[13] https://issues.apache.org/jira/browse/HUDI-2107
[14] https://issues.apache.org/jira/browse/HUDI-2171
[15] https://issues.apache.org/jira/browse/HUDI-1969
[16] https://issues.apache.org/jira/browse/HUDI-2165
[17] https://issues.apache.org/jira/browse/HUDI-2029
[18] https://issues.apache.org/jira/browse/HUDI-1633

===
Bugs

[Flink Integration] The coordinator send events to write function when
there are no data for the checkpoint [1]
[Core]  Initialize the maxMemorySizeInBytes in log scanner  [2]
[Flink Integration] StreamerUtil.medianInstantTime should return a valid
datetime string [3]
[Spark Integration] Exception Throw Out When MergeInto With Decimal Type
Field [4]
[Core] Improvement in packaging insert into smallfiles [5]
[Flink Integration] Make coordinator events as POJO for efficient
serialization [6]
[Flink Integraion] Fix flink batch compaction bug while user don't set
compaction tasks [7]
[Metadata Table] Ffix the bug that metatable cannot support non_partition
table [8]
[Core] Loaded too many classes like
sun/reflect/GeneratedSerializationConstructorAccessor in JVM metaspace [9]
[Flink Integration] Fix empty avro schema path caused by duplicate
parameters [10]
[Spark Integration] Incorrect Schema Inference For Schema Evolved Table [11]
[Metadata Table]  Fixed bootstrap of Metadata Table when some actions are
in progress [12]
[Core] FileSlices in the filegroup is not descending by timestamp [13]
[Flink Integration] Refactored String constants [14]
[Core] Add generics to avoif forced conversion in
BaseSparkCommitActionExecutor#partition [15]
[Spark Integration]  Fixing extra commit metadata in row writer path [16]
[Core] ]hive lock which state is WATING should be released, otherwise this
hive lock will be locked forever [17]
[Flink Integration] Fix conflict when flink-sql-connector-hive and
hudi-flink-bundle are both in flink lib [18]
[Spark Integration] Tweak the default compaction target IO to 500GB when
flink async compaction is off [19]
[Flink Integration] Support setting bucket assign parallelism for flink
write task [20]
[Flink Integration] Bug-Fix:Offline clustering(HoodieClusteringJob) will
cause insert action losing data [21]
[Flink Integration] Fix for AccessControlException for anonymous user [22]
[Spark Integration] Fix Compile Error For Spark3 [23]
[Spark Integration] Ensure and Audit docs for every configuration class in
the codebase [24]
[Spark Integration] Fix BucketAssignFunction Context NullPointerException
[25]
[Spark Integration] Remove the default parallelism of index bootstrap and
bucket assigner [26]


[1] https://issues.apache.org/jira/browse/HUDI-2126
[2] https://issues.apache.org/jira/browse/HUDI-2127
[3] 

Re: Welcome New Committers: Pengzhiwei and DannyChan

2021-07-18 Thread Dianjin Wang
Congratulations!

Best,
Dianjin Wang


On Fri, Jul 16, 2021 at 6:38 PM leesf  wrote:

> Hi all,
>
> Please join me in congratulating our newest committers *Pengzhiwei *and
> * DannyChan.*
>
> *Pengzhiwei *has been a consistent contributor to Hudi, he has
> contributed numerous features to Hudi, such as Spark SQL integration with
> Hudi, Spark Structured Streaming Source for Hudi and Spark FileIndex for
> Hudi and also lots of other good contributions around Spark, and also very
> active to answer users's questions. He is a solid team player and an asset
> to the project.
>
> *DannyChan* has contributed many good features, such as new streaming
> write pipeline for Flink with automatic compaction and cleaning (COW and
> MOR), batch and streaming reader for Flink (COW and MOR) and support Flink
> SQL connectors (reader and writer), he is actively join the ML and
> answer users' questions as well as wrote a Hudi Flink integration guide and
> launched a live show to promote Hudi Flink integration for Chinese users.
>
> Thanks so much for your continued contributions to make Hudi better and
> better!
>
> Also I would like to introduce the current state of Hudi in China. Hudi
> becomes more and more popular in China with the help of all community
> members and has been adopted by almost all top companies in China,
> including Alibaba, Baidu, ByteDance, Huawei, Tencent and other companies,
> from startups to large companies, data scale from TB to PB. You would find
> the logo wall below(PS: *unofficial statistics*, just listed some of them
> and you can contact me to add your company logo if wanted).
>
> We would not achieve this without such a good community and the
> contribution of all community members. Cheers and Go!
>
> [image: poweredby-0706.png]
>
> Thanks,
> Leesf
>


Re: Welcome our PMC Member, Raymond Xu

2021-07-18 Thread Dianjin Wang
Congratulations!

Best,
Dianjin Wang


On Sat, Jul 17, 2021 at 8:28 AM Vinoth Chandar  wrote:

> Folks,
>
> I am incredibly happy to share the addition of Raymond Xu to the Hudi PMC.
> Raymond has been a valuable member of our community, over the past few
> years now. Always hustlin and taking on the most underappreciated, but
> extremely valuable aspects of the project, mostly recently with getting our
> tests working smoothly on Azure CI!
>
> Please join me in congratulating Raymond!
>
> Onwards,
> Vinoth
>