date:20200818

Re: [DISCUSS] Release 0.6.0 timelines

2020-08-18 Thread Vinoth Chandar

We still have 1 blocker issue from TimestampKeyGenerator issue with joda
DateTimeFormatter. Sudha (RM) and Pratyaksh are going to look into this.

In the meantime, here's the progress, plans around testing so far. If folks
in the community can help test the release branch in the next couple of
days, it would be of great help!


   - Testing plan here: Stability tested by balaji running spark streaming,
   Correctness (Nishith testing on release branch, Udit to help with EMR setup)
   - Testing plan continued: Spark datasource MOR, .. Marker based
   rollback, upgrade-downgrade (vinoth is on this)
   - Testing plan : bootstrap (balaji tested for hive), udit has it working
   for Presto in EMR.
   - Testing plan: bulk_insert v2 (vinoth tested for correctness, file
   sizes. ), performance (tested on a cluster, microbenchmarks)


There will also be some blogs/docs to explain new stuff.

   - Blogs to explain new stuff: sort modes (vinoth), simple index
   (vinoth), bootstrap (balaji), multi delta-streamer (pratyaksh), Spark
   Streaming/DeltaStreamer continuous mode blog (balaji)
   - Docs: All missing docs will be added by Sudha.




On Tue, Aug 18, 2020 at 1:58 AM Bhavani Sudha 
wrote:

> Quick update on the RC.
>
> Found a build issue when building scala 2.12 and sent a PR for that -
> https://github.com/apache/hudi/pull/1976 . Working on resolving this in
> the
> release branch and updating RC. Will update soon.
>
> Thanks,
> Sudha
>
> On Fri, Aug 14, 2020 at 5:56 PM Vinoth Chandar  wrote:
>
> > Thanks Sudha! This is means master is now open for regular PRs. Thanks
> for
> > your patience, everyone.
> >
> > On Fri, Aug 14, 2020 at 3:51 PM Bhavani Sudha 
> > wrote:
> >
> > > Hello all,
> > >
> > > We have cut the release branch -
> > > https://github.com/apache/hudi/tree/release-0.6.0 . Since it is
> already
> > > Friday, we will be sending the release candidate early next week (after
> > > some testing).
> > >
> > > Happy Friday!
> > >
> > > Thanks,
> > > Sudha
> > >
> > > On Wed, Aug 12, 2020 at 3:56 PM vbal...@apache.org  >
> > > wrote:
> > >
> > > >
> > > > Hi Folks,
> > > > We are continuing to work on CI stabilization and will cut the
> release
> > > > once we stabilize the builds hopefully tonight/tomorrow.
> > > > Thanks,Balaji.V
> > > > On Tuesday, August 11, 2020, 09:15:05 PM PDT, Vinoth Chandar <
> > > > vin...@apache.org> wrote:
> > > >
> > > >  Hello all,
> > > >
> > > > Update on this. We have landed most of the blockers for the 0.6.0
> > release
> > > > and I am currently working on the last major blocker, HUDI-1013.
> > > > We are working through some unexpected CI flakiness. We hope to
> > stabilize
> > > > master, cut the RC, and then open up master for regular PR merges.
> > > > ETA for this is tomorrow night PST (Aug 12, PST).
> > > >
> > > > We will keep this thread posted!
> > > >
> > > > Thanks
> > > > Vinoth
> > > >
> > > > On Tue, Aug 4, 2020 at 9:47 PM Vinoth Chandar 
> > wrote:
> > > >
> > > > > Small correction:
> > > > >
> > > > > >> Vinoth working on code review, tests for PR 1876,
> > > > > This is landed!
> > > > >
> > > > >
> > > > > On Tue, Aug 4, 2020 at 9:44 PM Bhavani Sudha <
> > bhavanisud...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Hello all,
> > > > >>
> > > > >> We are targeting the end of this week to cut RC. Here is an update
> > of
> > > > >> where
> > > > >> we are at release blockers.
> > > > >>
> > > > >> 0.6.0 Release blocker status (board
> > > > >> <
> > > > >>
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=397=HUDI=detail=HUDI-69
> > > > >> >)
> > > > >> ,
> > > > >>
> > > > >>- Spark Datasource/MOR
> https://github.com/apache/hudi/pull/1848
> > > > needs
> > > > >> to
> > > > >>be tested by gary/balaji (About to land)
> > > > >>- Hive Sync restructuring (Review done, about to land)
> > > > >>- Bootstrap
> > > > >>  - Vinoth working on code review, tests for PR 1876,
> > > > >>  - then udit will rework PR 1702 (In Code review)
> > > > >>  - then we will review, land PR 1870, 1869
> > > > >>- Bulk insert V2 PR 1834, lower risk, independent PR, well
> tested
> > > > >> already
> > > > >>  - Dependent PR 1149 to be landed,
> > > > >>  - and modes to be respected in V2 impl as well (At risk)
> > > > >>- Upgrade Downgrade Hooks, PR 1858 : (In Code review)
> > > > >>- HUDI-1054- Marker list perf improvement, Udit has a PR out
> > > > >>- HUDI-115 : Overwrite with... ordering issue, Sudha has a PR
> > > nearing
> > > > >>landing
> > > > >>- HUDI-1098 : Marker file issue with non-existent files. (In
> Code
> > > > >> review)
> > > > >>- Spark Streaming + Async Compaction , test complete, code
> review
> > > > >>comments and land PR 1752 (About to land)
> > > > >>- Spark DataSource/Hive MOR Incremental Query HUDI-920 (At
> risk)
> > > > >>- Flink/Multi Engine refactor, will need a large rebase and
> > rework,
> > > > >>review, land (At

Weekly Sync Minutes 20200818

2020-08-18 Thread Vinoth Chandar

https://cwiki.apache.org/confluence/display/HUDI/20200818+Weekly+Sync+Minutes

Re: [DISCUSS] Codestyle: force multiline indentation

2020-08-18 Thread vino yang

> the key challenge has been keeping checkstyle, IDE and spotless agreeing
on the same thing.

Yes, it's the key thing. But, IMO, we can ignore the IDE here, if it breaks
the code style, checkstyle will stop building and spotless will work.

Vinoth Chandar  于2020年8月19日周三 上午7:49写道：

> the key challenge has been keeping checkstyle, IDE and spotless agreeing on
> the same thing.
>
> your understanding is correct. CI will enforce in a similar fashion.
> Spotless just makes us productive by auto fixing all the checkstyle
> violations, without having to manually fix by hand.
>
> On Tue, Aug 18, 2020 at 4:42 PM Shiyan Xu 
> wrote:
>
> > I think adding spotless as a tooling command to auto fix code is
> beneficial
> > and nothing harmful.
> > People are recommended to run it before commit or configure it in a
> > pre-commit hook.
> > From the CI point of view, it does not change the existing way of
> guarding
> > code style, does it? It'll still just run Checkstyle to report issues.
> > @Vinoth, am I understanding this correctly? Will Spotless be based on the
> > same style configured via Checkstyle?
> >
> > On Tue, Aug 18, 2020 at 4:16 PM vbal...@apache.org 
> > wrote:
> >
> > >  +1 on standardizing code formatting. On Tuesday, August 18, 2020,
> > > 03:58:42 PM PDT, Vinoth Chandar  wrote:
> > >
> > >  can more people please chime in?  This will affect all of us on a
> daily
> > > basis :)
> > >
> > > On Thu, Aug 13, 2020 at 8:25 AM Gary Li 
> > wrote:
> > >
> > > > Vote for mvn spotless:apply to do the auto fix.
> > > >
> > > > On Thu, Aug 13, 2020 at 1:13 AM Vinoth Chandar 
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Anyone has thoughts on this?
> > > > >
> > > > > esp leesf/vinoyang, given you both drove much of the initial
> > cleanups.
> > > > >
> > > > > On Mon, Aug 10, 2020 at 7:16 PM Shiyan Xu <
> > xu.shiyan.raym...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > in that case, yes, all for automation.
> > > > > >
> > > > > > On Mon, Aug 10, 2020 at 7:12 PM Vinoth Chandar <
> vin...@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > > Overall, I think we should standardize this across the project.
> > > > > > > But most importantly, may be revive the long dormant spotless
> > > effort
> > > > > > first
> > > > > > > to enable autofixing of checkstyle issues, before we add more
> > > > checking?
> > > > > > >
> > > > > > > On Mon, Aug 10, 2020 at 7:04 PM Shiyan Xu <
> > > > xu.shiyan.raym...@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I noticed that throughout the codebase, when method arguments
> > > wrap
> > > > > to a
> > > > > > > new
> > > > > > > > line, there are cases where indentation is 4 and other cases
> > > align
> > > > > the
> > > > > > > > wrapped line to the previous line of argument.
> > > > > > > >
> > > > > > > > The latter is caused by intelliJ settings of "Align when
> > > multiline"
> > > > > > > > enabled. This won't be flagged by checkstyle due to not
> setting
> > > > > > > > *forceStrictCondition* to *true*
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://checkstyle.sourceforge.io/config_misc.html#Indentation_Properties
> > > > > > > >
> > > > > > > > I'm suggesting setting this to true to avoid the discrepancy
> > and
> > > > > > > redundant
> > > > > > > > diffs in PR caused by individual IDE settings. People who
> have
> > > set
> > > > > > "Align
> > > > > > > > when multiline" will need to disable it to pass the
> checkstyle
> > > > > > > validation.
> > > > > > > >
> > > > > > > > WDYT?
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Raymond
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

[DISCUSS] Support Spark Structured Streaming read from Hudi table

2020-08-18 Thread linshan

hi team：
 I need  help,After a few days of thinking, trial and error, I have no 
idea.I wrote the relevant information on this page。Please follow this 
link（https://issues.apache.org/jira/browse/HUDI-1126)。
   
Best,
linshan-ma

Re: [DISCUSS] Codestyle: force multiline indentation

2020-08-18 Thread Vinoth Chandar

the key challenge has been keeping checkstyle, IDE and spotless agreeing on
the same thing.

your understanding is correct. CI will enforce in a similar fashion.
Spotless just makes us productive by auto fixing all the checkstyle
violations, without having to manually fix by hand.

On Tue, Aug 18, 2020 at 4:42 PM Shiyan Xu 
wrote:

> I think adding spotless as a tooling command to auto fix code is beneficial
> and nothing harmful.
> People are recommended to run it before commit or configure it in a
> pre-commit hook.
> From the CI point of view, it does not change the existing way of guarding
> code style, does it? It'll still just run Checkstyle to report issues.
> @Vinoth, am I understanding this correctly? Will Spotless be based on the
> same style configured via Checkstyle?
>
> On Tue, Aug 18, 2020 at 4:16 PM vbal...@apache.org 
> wrote:
>
> >  +1 on standardizing code formatting. On Tuesday, August 18, 2020,
> > 03:58:42 PM PDT, Vinoth Chandar  wrote:
> >
> >  can more people please chime in?  This will affect all of us on a daily
> > basis :)
> >
> > On Thu, Aug 13, 2020 at 8:25 AM Gary Li 
> wrote:
> >
> > > Vote for mvn spotless:apply to do the auto fix.
> > >
> > > On Thu, Aug 13, 2020 at 1:13 AM Vinoth Chandar 
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > Anyone has thoughts on this?
> > > >
> > > > esp leesf/vinoyang, given you both drove much of the initial
> cleanups.
> > > >
> > > > On Mon, Aug 10, 2020 at 7:16 PM Shiyan Xu <
> xu.shiyan.raym...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > in that case, yes, all for automation.
> > > > >
> > > > > On Mon, Aug 10, 2020 at 7:12 PM Vinoth Chandar 
> > > > wrote:
> > > > >
> > > > > > Overall, I think we should standardize this across the project.
> > > > > > But most importantly, may be revive the long dormant spotless
> > effort
> > > > > first
> > > > > > to enable autofixing of checkstyle issues, before we add more
> > > checking?
> > > > > >
> > > > > > On Mon, Aug 10, 2020 at 7:04 PM Shiyan Xu <
> > > xu.shiyan.raym...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I noticed that throughout the codebase, when method arguments
> > wrap
> > > > to a
> > > > > > new
> > > > > > > line, there are cases where indentation is 4 and other cases
> > align
> > > > the
> > > > > > > wrapped line to the previous line of argument.
> > > > > > >
> > > > > > > The latter is caused by intelliJ settings of "Align when
> > multiline"
> > > > > > > enabled. This won't be flagged by checkstyle due to not setting
> > > > > > > *forceStrictCondition* to *true*
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://checkstyle.sourceforge.io/config_misc.html#Indentation_Properties
> > > > > > >
> > > > > > > I'm suggesting setting this to true to avoid the discrepancy
> and
> > > > > > redundant
> > > > > > > diffs in PR caused by individual IDE settings. People who have
> > set
> > > > > "Align
> > > > > > > when multiline" will need to disable it to pass the checkstyle
> > > > > > validation.
> > > > > > >
> > > > > > > WDYT?
> > > > > > >
> > > > > > > Best,
> > > > > > > Raymond
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

[DISCUSS] Support for `_hoodie_record_key` as a virtual column

2020-08-18 Thread Abhishek Modi

Hi everyone!

I was hoping to discuss adding support for making `_hoodie_record_key` a
virtual column :)

Context:
Currently, _hoodie_record_key is written to DFS, as a column in the Parquet
file. In our production systems at Uber however, _hoodie_record_key
contains data that can be found in a different column (or set of columns).
This means that we are storing duplicated data.

Proposal:
In the interest of improving storage efficiency, we could add confs /
abstract classes that can construct the _hoodie_record_key given other
columns. That way we do not have to store duplicated data on DFS.

Any thoughts on this?

Best,
Modi

Re: [DISCUSS] Codestyle: force multiline indentation

2020-08-18 Thread vbal...@apache.org

 +1 on standardizing code formatting. On Tuesday, August 18, 2020, 03:58:42 
PM PDT, Vinoth Chandar  wrote:  
 
 can more people please chime in?  This will affect all of us on a daily
basis :)

On Thu, Aug 13, 2020 at 8:25 AM Gary Li  wrote:

> Vote for mvn spotless:apply to do the auto fix.
>
> On Thu, Aug 13, 2020 at 1:13 AM Vinoth Chandar  wrote:
>
> > Hi,
> >
> > Anyone has thoughts on this?
> >
> > esp leesf/vinoyang, given you both drove much of the initial cleanups.
> >
> > On Mon, Aug 10, 2020 at 7:16 PM Shiyan Xu 
> > wrote:
> >
> > > in that case, yes, all for automation.
> > >
> > > On Mon, Aug 10, 2020 at 7:12 PM Vinoth Chandar 
> > wrote:
> > >
> > > > Overall, I think we should standardize this across the project.
> > > > But most importantly, may be revive the long dormant spotless effort
> > > first
> > > > to enable autofixing of checkstyle issues, before we add more
> checking?
> > > >
> > > > On Mon, Aug 10, 2020 at 7:04 PM Shiyan Xu <
> xu.shiyan.raym...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I noticed that throughout the codebase, when method arguments wrap
> > to a
> > > > new
> > > > > line, there are cases where indentation is 4 and other cases align
> > the
> > > > > wrapped line to the previous line of argument.
> > > > >
> > > > > The latter is caused by intelliJ settings of "Align when multiline"
> > > > > enabled. This won't be flagged by checkstyle due to not setting
> > > > > *forceStrictCondition* to *true*
> > > > >
> > > > >
> > > >
> > >
> >
> https://checkstyle.sourceforge.io/config_misc.html#Indentation_Properties
> > > > >
> > > > > I'm suggesting setting this to true to avoid the discrepancy and
> > > > redundant
> > > > > diffs in PR caused by individual IDE settings. People who have set
> > > "Align
> > > > > when multiline" will need to disable it to pass the checkstyle
> > > > validation.
> > > > >
> > > > > WDYT?
> > > > >
> > > > > Best,
> > > > > Raymond
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Codestyle: force multiline indentation

2020-08-18 Thread Vinoth Chandar

can more people please chime in?  This will affect all of us on a daily
basis :)

On Thu, Aug 13, 2020 at 8:25 AM Gary Li  wrote:

> Vote for mvn spotless:apply to do the auto fix.
>
> On Thu, Aug 13, 2020 at 1:13 AM Vinoth Chandar  wrote:
>
> > Hi,
> >
> > Anyone has thoughts on this?
> >
> > esp leesf/vinoyang, given you both drove much of the initial cleanups.
> >
> > On Mon, Aug 10, 2020 at 7:16 PM Shiyan Xu 
> > wrote:
> >
> > > in that case, yes, all for automation.
> > >
> > > On Mon, Aug 10, 2020 at 7:12 PM Vinoth Chandar 
> > wrote:
> > >
> > > > Overall, I think we should standardize this across the project.
> > > > But most importantly, may be revive the long dormant spotless effort
> > > first
> > > > to enable autofixing of checkstyle issues, before we add more
> checking?
> > > >
> > > > On Mon, Aug 10, 2020 at 7:04 PM Shiyan Xu <
> xu.shiyan.raym...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I noticed that throughout the codebase, when method arguments wrap
> > to a
> > > > new
> > > > > line, there are cases where indentation is 4 and other cases align
> > the
> > > > > wrapped line to the previous line of argument.
> > > > >
> > > > > The latter is caused by intelliJ settings of "Align when multiline"
> > > > > enabled. This won't be flagged by checkstyle due to not setting
> > > > > *forceStrictCondition* to *true*
> > > > >
> > > > >
> > > >
> > >
> >
> https://checkstyle.sourceforge.io/config_misc.html#Indentation_Properties
> > > > >
> > > > > I'm suggesting setting this to true to avoid the discrepancy and
> > > > redundant
> > > > > diffs in PR caused by individual IDE settings. People who have set
> > > "Align
> > > > > when multiline" will need to disable it to pass the checkstyle
> > > > validation.
> > > > >
> > > > > WDYT?
> > > > >
> > > > > Best,
> > > > > Raymond
> > > > >
> > > >
> > >
> >
>

Re: Recommendation to load HUDI data across partitions

2020-08-18 Thread Vinoth Chandar

Great! glad you got it working!

On Fri, Aug 14, 2020 at 6:46 PM tanu dua  wrote:

> Thanks Vinoth for detailed explanation and I was about to reply you that it
> worked and followed most of the steps that you mentioned below.
> Used forEachBatch() of stream to process the batch data from kafka and then
> finding out the partitions using aggregate functions on Kafka Dataset and
> then feed those partitions using Glob Pattern to Hudi to get hudiDs
> Then performed join on both Ds , I had some complex logic to deduce from
> both kafkaDs and hudiDs and hence using flatMap but I am now able to remove
> flatMap and could use Dataset joins.
>
> Thanks again for all your help as always !!
>
>
>
>
> On Thu, Aug 13, 2020 at 1:42 PM Vinoth Chandar  wrote:
>
> > Hi Tanuj,
> >
> > From this example, it appears as if you are trying to use sparkSession
> from
> > within the executor? This will be problematic. Can you please open a
> > support ticket with the full stack trace?
> >
> > I think what you are describing is a join between Kafka and Hudi tables.
> So
> > I'd read from Kafka first, cache the 2K messages in memory, find out what
> > partitions they belong to, and only load those affected partitions
> instead
> > of the entire table.
> > At this point, you will have two datasets : kafkaDF and hudiDF (or RDD or
> > DataSet.. my suggestion remains valid)
> > And instead of hand crafting the join at the record level, like you have.
> > you can just use RDD/DataSet level join operations and then get a
> resultDF
> >
> > then you do a resultDF.write.format("hudi") and you are done?
> >
> > On Tue, Aug 11, 2020 at 2:33 AM Tanuj  wrote:
> >
> > > Hi,
> > > I have a problem statement where I am consuming messages from Kafka and
> > > then depending upon that Kafka message (2K records) I need to query
> Hudi
> > > table and create a dataset (with both updates and inserts) and push
> them
> > > back to Hudi table.
> > >
> > > I tried following but it threw NP exception from sparkSession scala
> code
> > > and rightly so as sparkSession was used in Executor.
> > >
> > >  Dataset hudiDs = companyStatusDf.flatMap(new
> > > FlatMapFunction() {
> > > @Override
> > > public Iterator call(KafkaRecord kafkaRecord)
> > > throws Exception {
> > > String prop1= kafkaRecord.getProp1();
> > > String prop2= kafkaRecord.getProp2();
> > > HudiRecord hudiRecord =  sparkSession.read()
> > > .format(HUDI_DATASOURCE)
> > > .schema()
> > > .load()
> > > .as(Encoders.bean((HudiRecord.class)))
> > > .filter( say prop1);
> > > hudiRecord = tranform();
> > > // Modificiation in hudi record
> > > return Arrays.asList(kafkaRecord,
> hudiRecord).iterator();
> > > }
> > >
> > > }
> > > }, Encoders.bean(CompanyStatusGoldenRecord.class));
> > >
> > > In HUDI, I have 2 level of partitions (year and month) so for eg if I
> get
> > > 2K records from Kafka which will be spanned across multiple partitions
> -
> > > what is advisable load first the full table like "/*/*/*" or first read
> > > kafka record, find out which partitions need to be hit and then load
> only
> > > those HUDI tables as per partitions .I believe 2nd option would be
> faster
> > > i.e. loading the specific partitions and thats what I was trying in
> above
> > > snippet of code. So if have to leverage partitions, is collect() on
> Kafka
> > > Dataset to get the list of partitions  and then supply to HUDI is the
> > only
> > > option or I can do it just with the spark datasets ?
> > >
> >
>

Re: [DISCUSS] Release 0.6.0 timelines

2020-08-18 Thread Bhavani Sudha

Quick update on the RC.

Found a build issue when building scala 2.12 and sent a PR for that -
https://github.com/apache/hudi/pull/1976 . Working on resolving this in the
release branch and updating RC. Will update soon.

Thanks,
Sudha

On Fri, Aug 14, 2020 at 5:56 PM Vinoth Chandar  wrote:

> Thanks Sudha! This is means master is now open for regular PRs. Thanks for
> your patience, everyone.
>
> On Fri, Aug 14, 2020 at 3:51 PM Bhavani Sudha 
> wrote:
>
> > Hello all,
> >
> > We have cut the release branch -
> > https://github.com/apache/hudi/tree/release-0.6.0 . Since it is already
> > Friday, we will be sending the release candidate early next week (after
> > some testing).
> >
> > Happy Friday!
> >
> > Thanks,
> > Sudha
> >
> > On Wed, Aug 12, 2020 at 3:56 PM vbal...@apache.org 
> > wrote:
> >
> > >
> > > Hi Folks,
> > > We are continuing to work on CI stabilization and will cut the release
> > > once we stabilize the builds hopefully tonight/tomorrow.
> > > Thanks,Balaji.V
> > > On Tuesday, August 11, 2020, 09:15:05 PM PDT, Vinoth Chandar <
> > > vin...@apache.org> wrote:
> > >
> > >  Hello all,
> > >
> > > Update on this. We have landed most of the blockers for the 0.6.0
> release
> > > and I am currently working on the last major blocker, HUDI-1013.
> > > We are working through some unexpected CI flakiness. We hope to
> stabilize
> > > master, cut the RC, and then open up master for regular PR merges.
> > > ETA for this is tomorrow night PST (Aug 12, PST).
> > >
> > > We will keep this thread posted!
> > >
> > > Thanks
> > > Vinoth
> > >
> > > On Tue, Aug 4, 2020 at 9:47 PM Vinoth Chandar 
> wrote:
> > >
> > > > Small correction:
> > > >
> > > > >> Vinoth working on code review, tests for PR 1876,
> > > > This is landed!
> > > >
> > > >
> > > > On Tue, Aug 4, 2020 at 9:44 PM Bhavani Sudha <
> bhavanisud...@gmail.com>
> > > > wrote:
> > > >
> > > >> Hello all,
> > > >>
> > > >> We are targeting the end of this week to cut RC. Here is an update
> of
> > > >> where
> > > >> we are at release blockers.
> > > >>
> > > >> 0.6.0 Release blocker status (board
> > > >> <
> > > >>
> > >
> >
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=397=HUDI=detail=HUDI-69
> > > >> >)
> > > >> ,
> > > >>
> > > >>- Spark Datasource/MOR https://github.com/apache/hudi/pull/1848
> > > needs
> > > >> to
> > > >>be tested by gary/balaji (About to land)
> > > >>- Hive Sync restructuring (Review done, about to land)
> > > >>- Bootstrap
> > > >>  - Vinoth working on code review, tests for PR 1876,
> > > >>  - then udit will rework PR 1702 (In Code review)
> > > >>  - then we will review, land PR 1870, 1869
> > > >>- Bulk insert V2 PR 1834, lower risk, independent PR, well tested
> > > >> already
> > > >>  - Dependent PR 1149 to be landed,
> > > >>  - and modes to be respected in V2 impl as well (At risk)
> > > >>- Upgrade Downgrade Hooks, PR 1858 : (In Code review)
> > > >>- HUDI-1054- Marker list perf improvement, Udit has a PR out
> > > >>- HUDI-115 : Overwrite with... ordering issue, Sudha has a PR
> > nearing
> > > >>landing
> > > >>- HUDI-1098 : Marker file issue with non-existent files. (In Code
> > > >> review)
> > > >>- Spark Streaming + Async Compaction , test complete, code review
> > > >>comments and land PR 1752 (About to land)
> > > >>- Spark DataSource/Hive MOR Incremental Query HUDI-920 (At risk)
> > > >>- Flink/Multi Engine refactor, will need a large rebase and
> rework,
> > > >>review, land (At risk for 0.6.0)
> > > >>- BloomIndex V2 - Global index implementation. (At risk)
> > > >>- HUDI-845 : Parallel writing i.e allow multiple writers (Pushed
> > out
> > > of
> > > >>0.6.0)
> > > >>- HUDI-860 : Small File Handling without memory caching (Pushed
> out
> > > of
> > > >>0.6.0)
> > > >>
> > > >>
> > > >> Thanks,
> > > >> Sudha
> > > >>
> > > >> On Mon, Aug 3, 2020 at 3:41 PM Vinoth Chandar 
> > > wrote:
> > > >>
> > > >> > +1 (we need to formalize this well)
> > > >> > But having just blockers land first, would help not just with
> > > rebasing,
> > > >> but
> > > >> > also wind down towards cutting an RC by end of week.
> > > >> >
> > > >> >
> > > >> > On Mon, Aug 3, 2020 at 2:53 PM Bhavani Sudha <
> > bhavanisud...@gmail.com
> > > >
> > > >> > wrote:
> > > >> >
> > > >> > > Hello all,
> > > >> > >
> > > >> > > As we are all hustling towards getting the blockers in, I wanted
> > to
> > > >> > propose
> > > >> > > a code/merge freeze until we cut a release for 0.6.0  and
> restrict
> > > it
> > > >> to
> > > >> > > only merging blockers identified for this release. It would
> reduce
> > > >> > rebasing
> > > >> > > time for blockers in progress. If we feel some issue is a
> serious
> > > >> blocker
> > > >> > > we can discuss it here and bump it's priority.
> > > >> > >
> > > >> > > Please share your thoughts or concerns.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Sudha
> > > >> > >
> > >

Re: [DISCUSS] Release 0.6.0 timelines

Weekly Sync Minutes 20200818

Re: [DISCUSS] Codestyle: force multiline indentation

[DISCUSS] Support Spark Structured Streaming read from Hudi table

Re: [DISCUSS] Codestyle: force multiline indentation

[DISCUSS] Support for `_hoodie_record_key` as a virtual column

Re: [DISCUSS] Codestyle: force multiline indentation

Re: [DISCUSS] Codestyle: force multiline indentation

Re: Recommendation to load HUDI data across partitions

Re: [DISCUSS] Release 0.6.0 timelines

10 matches

Site Navigation

Mail list logo

Footer information