Re: [DISCUSS] Planning for Releases 0.6.1 and 0.7.0

2020-10-11 Thread wangxianghu
Hi Rui
This article may answer your 
question:https://docs.google.com/document/d/1bYYPg3OJvAivTCVf9-2hBq1BT6jS7s64lyuMuM4ATV4/edit#heading=h.qn6yq5t0ot50
 

中文版:https://mp.weixin.qq.com/s/LvKaj5ytk6imEU5Dc1Sr5Q

> 2020年10月10日 下午9:16,Rui Li  写道:
> 
> Thanks for pointing me to the RFC! When using Spark to write a table, we
> need to launch several Spark jobs, e.g. to search index and tag locations,
> workload profiling, etc. Now RFC-13 aims to encapsulate all these in a
> single Flink DAG, right? Do we have plans about how to achieve this?
> 
> On Tue, Sep 29, 2020 at 9:40 AM 王**  wrote:
> 
>> Hi Rui
>> Thanks for asking, the design for flink integeration can be found here:
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=141724520
>> please ping me if you have any questions.
>> 
>> 
>> At 2020-09-28 20:43:22, "Rui Li"  wrote:
>>> Hello,
>>> 
>>> Very excited to see the on-going efforts for Flink integration. I wonder
>>> whether there's a design doc for this feature? I would like to learn more
>>> and hopefully to make some contributions.
>>> 
>>> On Fri, Sep 25, 2020 at 6:27 AM nishith agarwal 
>> wrote:
>>> 
 Yes, we have some ideas around schema evolution and have discussed with
 Balaji before as well. I'm going to put these thoughts down and share
>> it on
 the cWiki for all of us to jam. Realistically, I don't think we can hit
>> in
 0.7.0. We already have a pretty strong list of items for 0.7.0.
 
 Spark 3 SQL syntax like MERGE will definitely boost usability!
 
 Thanks,
 Nishith
 
 On Thu, Sep 24, 2020 at 3:22 PM Vinoth Chandar 
>> wrote:
 
> On schema evolution, Nishith and Balaji were both thinking about this.
 May
> be there is a proposal in works?
> I would guess we will not be able to hit it in 0.7.0 though. Maybe by
>> the
> end of year/0.8.0?
> 
> Tanu, thanks for the kind words! def, if we pull together, we will
>> reach
> there sooner. Looking forward to more contributions! :)
> 
>> We were actually thinking of moving to Spark 3.0 but thought it’s too
> early with 0.6 release. Is 0.6 not fully tested with Spark 3.0 ?
> That's correct. There is a PR already open for this. We expect this
>> to be
> fixed in 0.6.1 shortly and we will unlock spark 3.0 support
> 
> 0.7.0 will bring spark 3 SQL syntax like MERGE etc.  (Other systems
>> that
> have had this, either had an unfair head start or built ahead with
>> spark
 3
> in mind. :))
> We will close this gap down.
> 
> On Wed, Sep 23, 2020 at 6:25 PM Raymond Xu <
>> xu.shiyan.raym...@gmail.com>
> wrote:
> 
>> +1 on the full schema evolution support. May I know which ticket
>> this
 is
>> related to? thanks.
>> 
>> On Wed, Sep 23, 2020 at 5:20 AM leesf  wrote:
>> 
>>> Thanks Vinoth, also we would consider support full schema
> evolution(such
>> as
>>> 
>>> drop some fields) of hudi in 0.7.0, since right now hudi follows
>> avro
>>> 
>>> schema compatibility
>>> 
>>> 
>>> 
>>> tanu dua  于2020年9月23日周三 下午12:38写道:
>>> 
>>> 
>>> 
 Thanks Vinoth. These are really exciting items and hats off to
>> you
> and
>>> team
>>> 
 in pushing the releases swiftly and improving the framework all
>> the
>>> time. I
>>> 
 hope someday I will start contributing once I will get free
>> from my
>> major
>>> 
 deliverables and have more understanding the nitty gritty
>> details
 of
>>> Hudi.
>>> 
 
>>> 
 You have mentioned Spark3.0 support in next release. We were
 actually
>>> 
 thinking of moving to Spark 3.0 but thought it’s too early with
>> 0.6
>>> 
 release. Is 0.6 not fully tested with Spark 3.0 ?
>>> 
 
>>> 
 
>>> 
 On Wed, 23 Sep 2020 at 8:25 AM, Vinoth Chandar <
>> vin...@apache.org>
>>> wrote:
>>> 
 
>>> 
> Hello all,
>>> 
> 
>>> 
> 
>>> 
> 
>>> 
> Pursuant to our conversation around release planning, I am
>> happy
 to
>>> share
>>> 
> 
>>> 
> the initial set of proposals for the next minor/major releases
> (minor
>>> 
> 
>>> 
> release ofc can go out based on time)
>>> 
> 
>>> 
> 
>>> 
> 
>>> 
> *Next Minor version 0.6.1 (with stuff that did not make it to
>> 0.6.0..)
>>> *
>>> 
> 
>>> 
> Flink/Writer common refactoring for Flink
>>> 
> 
>>> 
> Small file handling support w/o caching
>>> 
> 
>>> 
> Spark3 Support
>>> 
> 
>>> 
> Remaining bootstrap items
>>> 
> 
>>> 
>

[ANNOUNCE] Hudi Community Bi-Weekly Update(2020-09-27 ~ 2020-10-11)

2020-10-11 Thread leesf
Dear community,

Nice to share Hudi community bi-weekly update for 2020-09-27 ~ 2020-10-11
with updates on features, bugfixs and tests.

===
Features

[Hive Sync] Make create hive database automatically configurable [1]
[Writer Core] Deltastreamer Kafka consumption delay reporting indicators [2]
[Writer Core] Introduce REPLACE top level action. Implement
insert_overwrite operation on top of replace action [3]
[Writer Core] Refactor hudi-client to support multi-engine [4]
[Metrics] Added an API to shutdown and remove the metrics reporter [5]
[Writer Core] add port configuration for EmbeddedTimelineService [6]
[Spark Integration] use spark INCREMENTAL mode query hudi dataset support
schema version [7]



===
Bugs

[Writer Core] Avoid blank file created by HoodieLogFormatWriter [8]
[Writer Core] relocated jetty in hudi-utilities-bundle pom [9]
[Writer Core]  Ordering Field should be optional when precombine is turned
off [10]
[DeltaStreamer] DeltaStreamer can now fetch schema before every run in
continuous mode [11]



==
Tests

[Test] Some improvements for the HUDI Test Suite [12]
[Test] Migrate HoodieTestUtils APIs to HoodieTestTable [13]


[1] https://issues.apache.org/jira/browse/HUDI-1192
[2] https://issues.apache.org/jira/browse/HUDI-1233
[3] https://issues.apache.org/jira/browse/HUDI-1072
[4] https://issues.apache.org/jira/browse/HUDI-1089
[5] https://issues.apache.org/jira/browse/HUDI-1305
[6] https://issues.apache.org/jira/browse/HUDI-1203
[7] https://issues.apache.org/jira/browse/HUDI-1301
[8] https://issues.apache.org/jira/browse/HUDI-840
[9] https://issues.apache.org/jira/browse/HUDI-1199
[10] https://issues.apache.org/jira/browse/HUDI-1208
[11] https://issues.apache.org/jira/browse/HUDI-603
[12] https://issues.apache.org/jira/browse/HUDI-1303
[13] https://issues.apache.org/jira/browse/HUDI-995


Best,
Leesf


Re: Adding recent talks to site

2020-10-11 Thread Sivabalan
For apacheCon, I was actually waiting for video recording to be available
in youtube so that I can link both slide deck and video. Will wait for a
couple of days, if not, will just attach the slide deck only.

On Thu, Oct 8, 2020 at 2:46 PM Vinoth Chandar  wrote:

> Hello all,
>
> Can you please add any recent talks to our powered by/talks page?
> I know we had an ApacheCon and may be one more talk?
>
> I opened https://github.com/apache/hudi/pull/2155 for the PrestoCon panel
>
> Thanks
> Vinoth
>


-- 
Regards,
-Sivabalan