[VOTE] Release 0.6.0, release candidate #1

2020-08-19 Thread Bhavani Sudha
Hi everyone,
Please review and vote on the release candidate #1 for the version 0.6.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases to be
deployed to dist.apache.org [2], which are signed with the key with
fingerprint 7F66CD4CE990983A284672293224F200E1FC2172 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-0.6.0-rc1" [5],

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Release Manager

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663
[2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.6.0-rc1/
[3] https://dist.apache.org/repos/dist/release/hudi/KEYS
[4] https://repository.apache.org/content/repositories/orgapachehudi-1025/
[5] https://github.com/apache/hudi/tree/release-0.6.0-rc1


Re: Request Contributor Access to JIRA

2020-08-19 Thread vino yang
Hi Jack,

Done and welcome to Hudi community!

Best,
Vino

Jack Ye  于2020年8月20日周四 上午8:34写道:

> Hi,
>
> I would like to request contributor access to the Hudi JIRA, my username is
> jackye.
>
> Thank you very much,
>
> Best,
> Jack Ye
>


Request Contributor Access to JIRA

2020-08-19 Thread Jack Ye
Hi,

I would like to request contributor access to the Hudi JIRA, my username is
jackye.

Thank you very much,

Best,
Jack Ye


Re: [DISCUSS] Release 0.6.0 timelines

2020-08-19 Thread Pratyaksh Sharma
Hi Allen,

Yes, it's a serialization (runtime) issue. I am working on fixing it.

On Wed, Aug 19, 2020 at 7:04 PM Sivabalan  wrote:

> That would be of great help Allen. Much appreciated.
>
> On Wed, Aug 19, 2020 at 9:30 AM Allen Underwood
>  wrote:
>
> > Thanks Sivabalan,
> >
> > That's definitely the issue I had to resolve when introducing Joda time.
> > I'll have a look back at my code to see how I got around it.  If I
> remember
> > correctly, in the original code, the DateTime object was being saved as
> an
> > instance variable on the object and I had to do things differently in the
> > Joda time approach - but it turned out not to be a big deal because the
> > instance variable wasn't needed.
> >
> > I'll see if I can get a PR in to help fix this.
> >
> > The worst part is no unit tests catch this - it's a runtime issue I
> assume
> > due to how Spark serializes state.
> >
> > Thanks,
> >
> > Allen
> >
> > On Wed, Aug 19, 2020 at 9:23 AM Sivabalan  wrote:
> >
> >> I am not sure if all findings so far have been documented here. but this
> >> is
> >>
> >>
> >> the ticket AFSIK: https://issues.apache.org/jira/browse/HUDI-1177
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Aug 19, 2020 at 9:15 AM Allen Underwood
> >>
> >>
> >>  wrote:
> >>
> >>
> >>
> >>
> >>
> >> > Just out of curiosity - what's the blocker - you have an issue?  I had
> >>
> >>
> >> > originally done the code to make that work.
> >>
> >>
> >> >
> >>
> >>
> >> > On Wed, Aug 19, 2020 at 12:51 AM Vinoth Chandar 
> >> wrote:
> >>
> >>
> >> >
> >>
> >>
> >> >> We still have 1 blocker issue from TimestampKeyGenerator issue with
> >> joda
> >>
> >>
> >> >> DateTimeFormatter. Sudha (RM) and Pratyaksh are going to look into
> >> this.
> >>
> >>
> >> >>
> >>
> >>
> >> >> In the meantime, here's the progress, plans around testing so far. If
> >>
> >>
> >> >> folks
> >>
> >>
> >> >> in the community can help test the release branch in the next couple
> of
> >>
> >>
> >> >> days, it would be of great help!
> >>
> >>
> >> >>
> >>
> >>
> >> >>
> >>
> >>
> >> >>- Testing plan here: Stability tested by balaji running spark
> >>
> >>
> >> >> streaming,
> >>
> >>
> >> >>Correctness (Nishith testing on release branch, Udit to help with
> >> EMR
> >>
> >>
> >> >> setup)
> >>
> >>
> >> >>- Testing plan continued: Spark datasource MOR, .. Marker based
> >>
> >>
> >> >>rollback, upgrade-downgrade (vinoth is on this)
> >>
> >>
> >> >>- Testing plan : bootstrap (balaji tested for hive), udit has it
> >>
> >>
> >> >> working
> >>
> >>
> >> >>for Presto in EMR.
> >>
> >>
> >> >>- Testing plan: bulk_insert v2 (vinoth tested for correctness,
> file
> >>
> >>
> >> >>sizes. ), performance (tested on a cluster, microbenchmarks)
> >>
> >>
> >> >>
> >>
> >>
> >> >>
> >>
> >>
> >> >> There will also be some blogs/docs to explain new stuff.
> >>
> >>
> >> >>
> >>
> >>
> >> >>- Blogs to explain new stuff: sort modes (vinoth), simple index
> >>
> >>
> >> >>(vinoth), bootstrap (balaji), multi delta-streamer (pratyaksh),
> >> Spark
> >>
> >>
> >> >>Streaming/DeltaStreamer continuous mode blog (balaji)
> >>
> >>
> >> >>- Docs: All missing docs will be added by Sudha.
> >>
> >>
> >> >>
> >>
> >>
> >> >>
> >>
> >>
> >> >>
> >>
> >>
> >> >>
> >>
> >>
> >> >> On Tue, Aug 18, 2020 at 1:58 AM Bhavani Sudha <
> bhavanisud...@gmail.com
> >> >
> >>
> >>
> >> >> wrote:
> >>
> >>
> >> >>
> >>
> >>
> >> >> > Quick update on the RC.
> >>
> >>
> >> >> >
> >>
> >>
> >> >> > Found a build issue when building scala 2.12 and sent a PR for
> that -
> >>
> >>
> >> >> > https://github.com/apache/hudi/pull/1976 . Working on resolving
> >> this in
> >>
> >>
> >> >> > the
> >>
> >>
> >> >> > release branch and updating RC. Will update soon.
> >>
> >>
> >> >> >
> >>
> >>
> >> >> > Thanks,
> >>
> >>
> >> >> > Sudha
> >>
> >>
> >> >> >
> >>
> >>
> >> >> > On Fri, Aug 14, 2020 at 5:56 PM Vinoth Chandar 
> >>
> >>
> >> >> wrote:
> >>
> >>
> >> >> >
> >>
> >>
> >> >> > > Thanks Sudha! This is means master is now open for regular PRs.
> >> Thanks
> >>
> >>
> >> >> > for
> >>
> >>
> >> >> > > your patience, everyone.
> >>
> >>
> >> >> > >
> >>
> >>
> >> >> > > On Fri, Aug 14, 2020 at 3:51 PM Bhavani Sudha <
> >>
> >>
> >> >> bhavanisud...@gmail.com>
> >>
> >>
> >> >> > > wrote:
> >>
> >>
> >> >> > >
> >>
> >>
> >> >> > > > Hello all,
> >>
> >>
> >> >> > > >
> >>
> >>
> >> >> > > > We have cut the release branch -
> >>
> >>
> >> >> > > > https://github.com/apache/hudi/tree/release-0.6.0 . Since it
> is
> >>
> >>
> >> >> > already
> >>
> >>
> >> >> > > > Friday, we will be sending the release candidate early next
> week
> >>
> >>
> >> >> (after
> >>
> >>
> >> >> > > > some testing).
> >>
> >>
> >> >> > > >
> >>
> >>
> >> >> > > > Happy Friday!
> >>
> >>
> >> >> > > >
> >>
> >>
> >> >> > > > Thanks,
> >>
> >>
> >> >> > > > Sudha
> >>
> >>
> >> >> > > >
> >>
> >>
> >> >> > > > On Wed, Aug 12, 2020 at 3:56 PM vbal...@apache.org <
> >>
> >>
> >> >> vbal...@apache.org
> >>
> >

Re: HUDI Read | Leverage Partitions

2020-08-19 Thread tanu dua
I am so sorry to bother you. It worked , there was some typo. Really
apologize

On Wed, Aug 19, 2020 at 7:01 PM tanu dua  wrote:

> Hi Gary,
> I am getting an exception while loading HUDI tables using glob path. Does
> it work ? Have someone tried it ? If I use without {} it works
> Caused by: org.apache.spark.sql.AnalysisException: Path does not exist:
> file:/C:/Hudi/data/co/A/2019/{3,4};
> at
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:552)
> at
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:545)
>
> On Tue, Jun 30, 2020 at 7:39 PM Tanuj  wrote:
>
>> Thanks a lot. I understand now.
>>
>> On 2020/06/27 02:45:52, Gary Li  wrote:
>> > Hi,
>> >
>> > If you use year=xxx/month=xxx folder structure, you can use Dataset
>> > df=
>> >
>> spark.read().format("hudi").schema(schema).load(+).
>> > Without a glob postfix, Spark can automatically load the partition
>> > information, just like regular parquet files.
>> >
>> https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#partition-discovery
>> >
>> > If you use something like 2020/06, you may need to build the glob string
>> > and add it to the load() to skip the unnecessary partitions. e.g.
>> > .load(++"2020/{05,06}")
>> >
>> > Or list one parquet file from different partitions and use a map
>> function
>> > to load 1 row from each path with a limit clause.
>> >
>> > On Fri, Jun 26, 2020 at 8:33 AM Tanuj  wrote:
>> >
>> > > Hi,
>> > > We have created a table with partition depth of 2 as year/month. We
>> need
>> > > to read data from HUDI in Spark Streaming layer where we get the
>> batch data
>> > > of say 10 rows which we need to use to read from HUDI. We are reading
>> it
>> > > like -
>> > >
>> > > // Read from HUDI
>> > > Dataset df=
>> > >
>> spark.read().format("hudi").schema(schema).load(++"/*/*")
>> > >
>> > > //Apply filter
>> > >
>> > >
>> df=df.filter(df.col("year").isin().filter(df.col("month").isin()).filter(df.col("id").isin());
>> > >
>> > > Is it the best way to read the data ? Will HUDI take care of just
>> reading
>> > > from the partitions or we need to take care of ? For eg. If I need to
>> read
>> > > just 1 row we can build the full path and then read which will read
>> the
>> > > parquet file from that partition quickly but here our requirement is
>> to
>> > > read data from multiple partitions.
>> > >
>> > >
>> > >
>> >
>>
>


Re: [DISCUSS] Release 0.6.0 timelines

2020-08-19 Thread Sivabalan
That would be of great help Allen. Much appreciated.

On Wed, Aug 19, 2020 at 9:30 AM Allen Underwood
 wrote:

> Thanks Sivabalan,
>
> That's definitely the issue I had to resolve when introducing Joda time.
> I'll have a look back at my code to see how I got around it.  If I remember
> correctly, in the original code, the DateTime object was being saved as an
> instance variable on the object and I had to do things differently in the
> Joda time approach - but it turned out not to be a big deal because the
> instance variable wasn't needed.
>
> I'll see if I can get a PR in to help fix this.
>
> The worst part is no unit tests catch this - it's a runtime issue I assume
> due to how Spark serializes state.
>
> Thanks,
>
> Allen
>
> On Wed, Aug 19, 2020 at 9:23 AM Sivabalan  wrote:
>
>> I am not sure if all findings so far have been documented here. but this
>> is
>>
>>
>> the ticket AFSIK: https://issues.apache.org/jira/browse/HUDI-1177
>>
>>
>>
>>
>>
>> On Wed, Aug 19, 2020 at 9:15 AM Allen Underwood
>>
>>
>>  wrote:
>>
>>
>>
>>
>>
>> > Just out of curiosity - what's the blocker - you have an issue?  I had
>>
>>
>> > originally done the code to make that work.
>>
>>
>> >
>>
>>
>> > On Wed, Aug 19, 2020 at 12:51 AM Vinoth Chandar 
>> wrote:
>>
>>
>> >
>>
>>
>> >> We still have 1 blocker issue from TimestampKeyGenerator issue with
>> joda
>>
>>
>> >> DateTimeFormatter. Sudha (RM) and Pratyaksh are going to look into
>> this.
>>
>>
>> >>
>>
>>
>> >> In the meantime, here's the progress, plans around testing so far. If
>>
>>
>> >> folks
>>
>>
>> >> in the community can help test the release branch in the next couple of
>>
>>
>> >> days, it would be of great help!
>>
>>
>> >>
>>
>>
>> >>
>>
>>
>> >>- Testing plan here: Stability tested by balaji running spark
>>
>>
>> >> streaming,
>>
>>
>> >>Correctness (Nishith testing on release branch, Udit to help with
>> EMR
>>
>>
>> >> setup)
>>
>>
>> >>- Testing plan continued: Spark datasource MOR, .. Marker based
>>
>>
>> >>rollback, upgrade-downgrade (vinoth is on this)
>>
>>
>> >>- Testing plan : bootstrap (balaji tested for hive), udit has it
>>
>>
>> >> working
>>
>>
>> >>for Presto in EMR.
>>
>>
>> >>- Testing plan: bulk_insert v2 (vinoth tested for correctness, file
>>
>>
>> >>sizes. ), performance (tested on a cluster, microbenchmarks)
>>
>>
>> >>
>>
>>
>> >>
>>
>>
>> >> There will also be some blogs/docs to explain new stuff.
>>
>>
>> >>
>>
>>
>> >>- Blogs to explain new stuff: sort modes (vinoth), simple index
>>
>>
>> >>(vinoth), bootstrap (balaji), multi delta-streamer (pratyaksh),
>> Spark
>>
>>
>> >>Streaming/DeltaStreamer continuous mode blog (balaji)
>>
>>
>> >>- Docs: All missing docs will be added by Sudha.
>>
>>
>> >>
>>
>>
>> >>
>>
>>
>> >>
>>
>>
>> >>
>>
>>
>> >> On Tue, Aug 18, 2020 at 1:58 AM Bhavani Sudha > >
>>
>>
>> >> wrote:
>>
>>
>> >>
>>
>>
>> >> > Quick update on the RC.
>>
>>
>> >> >
>>
>>
>> >> > Found a build issue when building scala 2.12 and sent a PR for that -
>>
>>
>> >> > https://github.com/apache/hudi/pull/1976 . Working on resolving
>> this in
>>
>>
>> >> > the
>>
>>
>> >> > release branch and updating RC. Will update soon.
>>
>>
>> >> >
>>
>>
>> >> > Thanks,
>>
>>
>> >> > Sudha
>>
>>
>> >> >
>>
>>
>> >> > On Fri, Aug 14, 2020 at 5:56 PM Vinoth Chandar 
>>
>>
>> >> wrote:
>>
>>
>> >> >
>>
>>
>> >> > > Thanks Sudha! This is means master is now open for regular PRs.
>> Thanks
>>
>>
>> >> > for
>>
>>
>> >> > > your patience, everyone.
>>
>>
>> >> > >
>>
>>
>> >> > > On Fri, Aug 14, 2020 at 3:51 PM Bhavani Sudha <
>>
>>
>> >> bhavanisud...@gmail.com>
>>
>>
>> >> > > wrote:
>>
>>
>> >> > >
>>
>>
>> >> > > > Hello all,
>>
>>
>> >> > > >
>>
>>
>> >> > > > We have cut the release branch -
>>
>>
>> >> > > > https://github.com/apache/hudi/tree/release-0.6.0 . Since it is
>>
>>
>> >> > already
>>
>>
>> >> > > > Friday, we will be sending the release candidate early next week
>>
>>
>> >> (after
>>
>>
>> >> > > > some testing).
>>
>>
>> >> > > >
>>
>>
>> >> > > > Happy Friday!
>>
>>
>> >> > > >
>>
>>
>> >> > > > Thanks,
>>
>>
>> >> > > > Sudha
>>
>>
>> >> > > >
>>
>>
>> >> > > > On Wed, Aug 12, 2020 at 3:56 PM vbal...@apache.org <
>>
>>
>> >> vbal...@apache.org
>>
>>
>> >> > >
>>
>>
>> >> > > > wrote:
>>
>>
>> >> > > >
>>
>>
>> >> > > > >
>>
>>
>> >> > > > > Hi Folks,
>>
>>
>> >> > > > > We are continuing to work on CI stabilization and will cut the
>>
>>
>> >> > release
>>
>>
>> >> > > > > once we stabilize the builds hopefully tonight/tomorrow.
>>
>>
>> >> > > > > Thanks,Balaji.V
>>
>>
>> >> > > > > On Tuesday, August 11, 2020, 09:15:05 PM PDT, Vinoth
>> Chandar <
>>
>>
>> >> > > > > vin...@apache.org> wrote:
>>
>>
>> >> > > > >
>>
>>
>> >> > > > >  Hello all,
>>
>>
>> >> > > > >
>>
>>
>> >> > > > > Update on this. We have landed most of the blockers for the
>> 0.6.0
>>
>>
>> >> > > release
>>
>>
>> >> > > > > and I am currently working on the last major blocker,
>> HUDI-1013.
>>
>>
>> >

Re: HUDI Read | Leverage Partitions

2020-08-19 Thread tanu dua
Hi Gary,
I am getting an exception while loading HUDI tables using glob path. Does
it work ? Have someone tried it ? If I use without {} it works
Caused by: org.apache.spark.sql.AnalysisException: Path does not exist:
file:/C:/Hudi/data/co/A/2019/{3,4};
at
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:552)
at
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:545)

On Tue, Jun 30, 2020 at 7:39 PM Tanuj  wrote:

> Thanks a lot. I understand now.
>
> On 2020/06/27 02:45:52, Gary Li  wrote:
> > Hi,
> >
> > If you use year=xxx/month=xxx folder structure, you can use Dataset
> > df=
> >
> spark.read().format("hudi").schema(schema).load(+).
> > Without a glob postfix, Spark can automatically load the partition
> > information, just like regular parquet files.
> >
> https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#partition-discovery
> >
> > If you use something like 2020/06, you may need to build the glob string
> > and add it to the load() to skip the unnecessary partitions. e.g.
> > .load(++"2020/{05,06}")
> >
> > Or list one parquet file from different partitions and use a map function
> > to load 1 row from each path with a limit clause.
> >
> > On Fri, Jun 26, 2020 at 8:33 AM Tanuj  wrote:
> >
> > > Hi,
> > > We have created a table with partition depth of 2 as year/month. We
> need
> > > to read data from HUDI in Spark Streaming layer where we get the batch
> data
> > > of say 10 rows which we need to use to read from HUDI. We are reading
> it
> > > like -
> > >
> > > // Read from HUDI
> > > Dataset df=
> > >
> spark.read().format("hudi").schema(schema).load(++"/*/*")
> > >
> > > //Apply filter
> > >
> > >
> df=df.filter(df.col("year").isin().filter(df.col("month").isin()).filter(df.col("id").isin());
> > >
> > > Is it the best way to read the data ? Will HUDI take care of just
> reading
> > > from the partitions or we need to take care of ? For eg. If I need to
> read
> > > just 1 row we can build the full path and then read which will read the
> > > parquet file from that partition quickly but here our requirement is to
> > > read data from multiple partitions.
> > >
> > >
> > >
> >
>


Re: [DISCUSS] Release 0.6.0 timelines

2020-08-19 Thread Allen Underwood
Thanks Sivabalan,

That's definitely the issue I had to resolve when introducing Joda time.
I'll have a look back at my code to see how I got around it.  If I remember
correctly, in the original code, the DateTime object was being saved as an
instance variable on the object and I had to do things differently in the
Joda time approach - but it turned out not to be a big deal because the
instance variable wasn't needed.

I'll see if I can get a PR in to help fix this.

The worst part is no unit tests catch this - it's a runtime issue I assume
due to how Spark serializes state.

Thanks,

Allen

On Wed, Aug 19, 2020 at 9:23 AM Sivabalan  wrote:

> I am not sure if all findings so far have been documented here. but this is
> the ticket AFSIK: https://issues.apache.org/jira/browse/HUDI-1177
>
> On Wed, Aug 19, 2020 at 9:15 AM Allen Underwood
>  wrote:
>
> > Just out of curiosity - what's the blocker - you have an issue?  I had
> > originally done the code to make that work.
> >
> > On Wed, Aug 19, 2020 at 12:51 AM Vinoth Chandar 
> wrote:
> >
> >> We still have 1 blocker issue from TimestampKeyGenerator issue with joda
> >> DateTimeFormatter. Sudha (RM) and Pratyaksh are going to look into this.
> >>
> >> In the meantime, here's the progress, plans around testing so far. If
> >> folks
> >> in the community can help test the release branch in the next couple of
> >> days, it would be of great help!
> >>
> >>
> >>- Testing plan here: Stability tested by balaji running spark
> >> streaming,
> >>Correctness (Nishith testing on release branch, Udit to help with EMR
> >> setup)
> >>- Testing plan continued: Spark datasource MOR, .. Marker based
> >>rollback, upgrade-downgrade (vinoth is on this)
> >>- Testing plan : bootstrap (balaji tested for hive), udit has it
> >> working
> >>for Presto in EMR.
> >>- Testing plan: bulk_insert v2 (vinoth tested for correctness, file
> >>sizes. ), performance (tested on a cluster, microbenchmarks)
> >>
> >>
> >> There will also be some blogs/docs to explain new stuff.
> >>
> >>- Blogs to explain new stuff: sort modes (vinoth), simple index
> >>(vinoth), bootstrap (balaji), multi delta-streamer (pratyaksh), Spark
> >>Streaming/DeltaStreamer continuous mode blog (balaji)
> >>- Docs: All missing docs will be added by Sudha.
> >>
> >>
> >>
> >>
> >> On Tue, Aug 18, 2020 at 1:58 AM Bhavani Sudha 
> >> wrote:
> >>
> >> > Quick update on the RC.
> >> >
> >> > Found a build issue when building scala 2.12 and sent a PR for that -
> >> > https://github.com/apache/hudi/pull/1976 . Working on resolving this
> in
> >> > the
> >> > release branch and updating RC. Will update soon.
> >> >
> >> > Thanks,
> >> > Sudha
> >> >
> >> > On Fri, Aug 14, 2020 at 5:56 PM Vinoth Chandar 
> >> wrote:
> >> >
> >> > > Thanks Sudha! This is means master is now open for regular PRs.
> Thanks
> >> > for
> >> > > your patience, everyone.
> >> > >
> >> > > On Fri, Aug 14, 2020 at 3:51 PM Bhavani Sudha <
> >> bhavanisud...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Hello all,
> >> > > >
> >> > > > We have cut the release branch -
> >> > > > https://github.com/apache/hudi/tree/release-0.6.0 . Since it is
> >> > already
> >> > > > Friday, we will be sending the release candidate early next week
> >> (after
> >> > > > some testing).
> >> > > >
> >> > > > Happy Friday!
> >> > > >
> >> > > > Thanks,
> >> > > > Sudha
> >> > > >
> >> > > > On Wed, Aug 12, 2020 at 3:56 PM vbal...@apache.org <
> >> vbal...@apache.org
> >> > >
> >> > > > wrote:
> >> > > >
> >> > > > >
> >> > > > > Hi Folks,
> >> > > > > We are continuing to work on CI stabilization and will cut the
> >> > release
> >> > > > > once we stabilize the builds hopefully tonight/tomorrow.
> >> > > > > Thanks,Balaji.V
> >> > > > > On Tuesday, August 11, 2020, 09:15:05 PM PDT, Vinoth
> Chandar <
> >> > > > > vin...@apache.org> wrote:
> >> > > > >
> >> > > > >  Hello all,
> >> > > > >
> >> > > > > Update on this. We have landed most of the blockers for the
> 0.6.0
> >> > > release
> >> > > > > and I am currently working on the last major blocker, HUDI-1013.
> >> > > > > We are working through some unexpected CI flakiness. We hope to
> >> > > stabilize
> >> > > > > master, cut the RC, and then open up master for regular PR
> merges.
> >> > > > > ETA for this is tomorrow night PST (Aug 12, PST).
> >> > > > >
> >> > > > > We will keep this thread posted!
> >> > > > >
> >> > > > > Thanks
> >> > > > > Vinoth
> >> > > > >
> >> > > > > On Tue, Aug 4, 2020 at 9:47 PM Vinoth Chandar <
> vin...@apache.org>
> >> > > wrote:
> >> > > > >
> >> > > > > > Small correction:
> >> > > > > >
> >> > > > > > >> Vinoth working on code review, tests for PR 1876,
> >> > > > > > This is landed!
> >> > > > > >
> >> > > > > >
> >> > > > > > On Tue, Aug 4, 2020 at 9:44 PM Bhavani Sudha <
> >> > > bhavanisud...@gmail.com>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > >> Hello all,
> >> > > > > >>
> >> > > > > >> We are targeting the end of 

Re: [DISCUSS] Release 0.6.0 timelines

2020-08-19 Thread Sivabalan
I am not sure if all findings so far have been documented here. but this is
the ticket AFSIK: https://issues.apache.org/jira/browse/HUDI-1177

On Wed, Aug 19, 2020 at 9:15 AM Allen Underwood
 wrote:

> Just out of curiosity - what's the blocker - you have an issue?  I had
> originally done the code to make that work.
>
> On Wed, Aug 19, 2020 at 12:51 AM Vinoth Chandar  wrote:
>
>> We still have 1 blocker issue from TimestampKeyGenerator issue with joda
>> DateTimeFormatter. Sudha (RM) and Pratyaksh are going to look into this.
>>
>> In the meantime, here's the progress, plans around testing so far. If
>> folks
>> in the community can help test the release branch in the next couple of
>> days, it would be of great help!
>>
>>
>>- Testing plan here: Stability tested by balaji running spark
>> streaming,
>>Correctness (Nishith testing on release branch, Udit to help with EMR
>> setup)
>>- Testing plan continued: Spark datasource MOR, .. Marker based
>>rollback, upgrade-downgrade (vinoth is on this)
>>- Testing plan : bootstrap (balaji tested for hive), udit has it
>> working
>>for Presto in EMR.
>>- Testing plan: bulk_insert v2 (vinoth tested for correctness, file
>>sizes. ), performance (tested on a cluster, microbenchmarks)
>>
>>
>> There will also be some blogs/docs to explain new stuff.
>>
>>- Blogs to explain new stuff: sort modes (vinoth), simple index
>>(vinoth), bootstrap (balaji), multi delta-streamer (pratyaksh), Spark
>>Streaming/DeltaStreamer continuous mode blog (balaji)
>>- Docs: All missing docs will be added by Sudha.
>>
>>
>>
>>
>> On Tue, Aug 18, 2020 at 1:58 AM Bhavani Sudha 
>> wrote:
>>
>> > Quick update on the RC.
>> >
>> > Found a build issue when building scala 2.12 and sent a PR for that -
>> > https://github.com/apache/hudi/pull/1976 . Working on resolving this in
>> > the
>> > release branch and updating RC. Will update soon.
>> >
>> > Thanks,
>> > Sudha
>> >
>> > On Fri, Aug 14, 2020 at 5:56 PM Vinoth Chandar 
>> wrote:
>> >
>> > > Thanks Sudha! This is means master is now open for regular PRs. Thanks
>> > for
>> > > your patience, everyone.
>> > >
>> > > On Fri, Aug 14, 2020 at 3:51 PM Bhavani Sudha <
>> bhavanisud...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hello all,
>> > > >
>> > > > We have cut the release branch -
>> > > > https://github.com/apache/hudi/tree/release-0.6.0 . Since it is
>> > already
>> > > > Friday, we will be sending the release candidate early next week
>> (after
>> > > > some testing).
>> > > >
>> > > > Happy Friday!
>> > > >
>> > > > Thanks,
>> > > > Sudha
>> > > >
>> > > > On Wed, Aug 12, 2020 at 3:56 PM vbal...@apache.org <
>> vbal...@apache.org
>> > >
>> > > > wrote:
>> > > >
>> > > > >
>> > > > > Hi Folks,
>> > > > > We are continuing to work on CI stabilization and will cut the
>> > release
>> > > > > once we stabilize the builds hopefully tonight/tomorrow.
>> > > > > Thanks,Balaji.V
>> > > > > On Tuesday, August 11, 2020, 09:15:05 PM PDT, Vinoth Chandar <
>> > > > > vin...@apache.org> wrote:
>> > > > >
>> > > > >  Hello all,
>> > > > >
>> > > > > Update on this. We have landed most of the blockers for the 0.6.0
>> > > release
>> > > > > and I am currently working on the last major blocker, HUDI-1013.
>> > > > > We are working through some unexpected CI flakiness. We hope to
>> > > stabilize
>> > > > > master, cut the RC, and then open up master for regular PR merges.
>> > > > > ETA for this is tomorrow night PST (Aug 12, PST).
>> > > > >
>> > > > > We will keep this thread posted!
>> > > > >
>> > > > > Thanks
>> > > > > Vinoth
>> > > > >
>> > > > > On Tue, Aug 4, 2020 at 9:47 PM Vinoth Chandar 
>> > > wrote:
>> > > > >
>> > > > > > Small correction:
>> > > > > >
>> > > > > > >> Vinoth working on code review, tests for PR 1876,
>> > > > > > This is landed!
>> > > > > >
>> > > > > >
>> > > > > > On Tue, Aug 4, 2020 at 9:44 PM Bhavani Sudha <
>> > > bhavanisud...@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > >> Hello all,
>> > > > > >>
>> > > > > >> We are targeting the end of this week to cut RC. Here is an
>> update
>> > > of
>> > > > > >> where
>> > > > > >> we are at release blockers.
>> > > > > >>
>> > > > > >> 0.6.0 Release blocker status (board
>> > > > > >> <
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=397&projectKey=HUDI&view=detail&selectedIssue=HUDI-69
>> > > > > >> >)
>> > > > > >> ,
>> > > > > >>
>> > > > > >>- Spark Datasource/MOR
>> > https://github.com/apache/hudi/pull/1848
>> > > > > needs
>> > > > > >> to
>> > > > > >>be tested by gary/balaji (About to land)
>> > > > > >>- Hive Sync restructuring (Review done, about to land)
>> > > > > >>- Bootstrap
>> > > > > >>  - Vinoth working on code review, tests for PR 1876,
>> > > > > >>  - then udit will rework PR 1702 (In Code review)
>> > > > > >>  - then we will review, land PR 1870, 1869
>> > > > > >>- Bulk insert V2 PR 1834, lowe

Re: [DISCUSS] Release 0.6.0 timelines

2020-08-19 Thread Allen Underwood
Just out of curiosity - what's the blocker - you have an issue?  I had
originally done the code to make that work.

On Wed, Aug 19, 2020 at 12:51 AM Vinoth Chandar  wrote:

> We still have 1 blocker issue from TimestampKeyGenerator issue with joda
> DateTimeFormatter. Sudha (RM) and Pratyaksh are going to look into this.
>
> In the meantime, here's the progress, plans around testing so far. If folks
> in the community can help test the release branch in the next couple of
> days, it would be of great help!
>
>
>- Testing plan here: Stability tested by balaji running spark streaming,
>Correctness (Nishith testing on release branch, Udit to help with EMR
> setup)
>- Testing plan continued: Spark datasource MOR, .. Marker based
>rollback, upgrade-downgrade (vinoth is on this)
>- Testing plan : bootstrap (balaji tested for hive), udit has it working
>for Presto in EMR.
>- Testing plan: bulk_insert v2 (vinoth tested for correctness, file
>sizes. ), performance (tested on a cluster, microbenchmarks)
>
>
> There will also be some blogs/docs to explain new stuff.
>
>- Blogs to explain new stuff: sort modes (vinoth), simple index
>(vinoth), bootstrap (balaji), multi delta-streamer (pratyaksh), Spark
>Streaming/DeltaStreamer continuous mode blog (balaji)
>- Docs: All missing docs will be added by Sudha.
>
>
>
>
> On Tue, Aug 18, 2020 at 1:58 AM Bhavani Sudha 
> wrote:
>
> > Quick update on the RC.
> >
> > Found a build issue when building scala 2.12 and sent a PR for that -
> > https://github.com/apache/hudi/pull/1976 . Working on resolving this in
> > the
> > release branch and updating RC. Will update soon.
> >
> > Thanks,
> > Sudha
> >
> > On Fri, Aug 14, 2020 at 5:56 PM Vinoth Chandar 
> wrote:
> >
> > > Thanks Sudha! This is means master is now open for regular PRs. Thanks
> > for
> > > your patience, everyone.
> > >
> > > On Fri, Aug 14, 2020 at 3:51 PM Bhavani Sudha  >
> > > wrote:
> > >
> > > > Hello all,
> > > >
> > > > We have cut the release branch -
> > > > https://github.com/apache/hudi/tree/release-0.6.0 . Since it is
> > already
> > > > Friday, we will be sending the release candidate early next week
> (after
> > > > some testing).
> > > >
> > > > Happy Friday!
> > > >
> > > > Thanks,
> > > > Sudha
> > > >
> > > > On Wed, Aug 12, 2020 at 3:56 PM vbal...@apache.org <
> vbal...@apache.org
> > >
> > > > wrote:
> > > >
> > > > >
> > > > > Hi Folks,
> > > > > We are continuing to work on CI stabilization and will cut the
> > release
> > > > > once we stabilize the builds hopefully tonight/tomorrow.
> > > > > Thanks,Balaji.V
> > > > > On Tuesday, August 11, 2020, 09:15:05 PM PDT, Vinoth Chandar <
> > > > > vin...@apache.org> wrote:
> > > > >
> > > > >  Hello all,
> > > > >
> > > > > Update on this. We have landed most of the blockers for the 0.6.0
> > > release
> > > > > and I am currently working on the last major blocker, HUDI-1013.
> > > > > We are working through some unexpected CI flakiness. We hope to
> > > stabilize
> > > > > master, cut the RC, and then open up master for regular PR merges.
> > > > > ETA for this is tomorrow night PST (Aug 12, PST).
> > > > >
> > > > > We will keep this thread posted!
> > > > >
> > > > > Thanks
> > > > > Vinoth
> > > > >
> > > > > On Tue, Aug 4, 2020 at 9:47 PM Vinoth Chandar 
> > > wrote:
> > > > >
> > > > > > Small correction:
> > > > > >
> > > > > > >> Vinoth working on code review, tests for PR 1876,
> > > > > > This is landed!
> > > > > >
> > > > > >
> > > > > > On Tue, Aug 4, 2020 at 9:44 PM Bhavani Sudha <
> > > bhavanisud...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Hello all,
> > > > > >>
> > > > > >> We are targeting the end of this week to cut RC. Here is an
> update
> > > of
> > > > > >> where
> > > > > >> we are at release blockers.
> > > > > >>
> > > > > >> 0.6.0 Release blocker status (board
> > > > > >> <
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=397&projectKey=HUDI&view=detail&selectedIssue=HUDI-69
> > > > > >> >)
> > > > > >> ,
> > > > > >>
> > > > > >>- Spark Datasource/MOR
> > https://github.com/apache/hudi/pull/1848
> > > > > needs
> > > > > >> to
> > > > > >>be tested by gary/balaji (About to land)
> > > > > >>- Hive Sync restructuring (Review done, about to land)
> > > > > >>- Bootstrap
> > > > > >>  - Vinoth working on code review, tests for PR 1876,
> > > > > >>  - then udit will rework PR 1702 (In Code review)
> > > > > >>  - then we will review, land PR 1870, 1869
> > > > > >>- Bulk insert V2 PR 1834, lower risk, independent PR, well
> > tested
> > > > > >> already
> > > > > >>  - Dependent PR 1149 to be landed,
> > > > > >>  - and modes to be respected in V2 impl as well (At risk)
> > > > > >>- Upgrade Downgrade Hooks, PR 1858 : (In Code review)
> > > > > >>- HUDI-1054- Marker list perf improvement, Udit has a PR out
> > > > > >>- HUDI-115 : Overwrite with... 

Re: JIRA contributor permission

2020-08-19 Thread vino yang
Hi Guoguang,

Done and welcome to Hudi community!

Best,
Vino

Guoguang.Wang  于2020年8月19日周三 下午6:47写道:

> Hi,
> I want to contribute to Apache Hudi.
> Would you please give me the contributor permission?
> My JIRA ID is guoguang.wang


JIRA contributor permission

2020-08-19 Thread Guoguang.Wang
Hi,
I want to contribute to Apache Hudi.
Would you please give me the contributor permission?
My JIRA ID is guoguang.wang

Re: Re: [DISCUSS] Release 0.6.0 timelines

2020-08-19 Thread 957029...@qq.com

Hi, vc
I also  want to do some tests against the release branch.



957029...@qq.com
 
From: Vinoth Chandar
Date: 2020-08-19 12:51
To: dev
Subject: Re: [DISCUSS] Release 0.6.0 timelines
We still have 1 blocker issue from TimestampKeyGenerator issue with joda
DateTimeFormatter. Sudha (RM) and Pratyaksh are going to look into this.
 
In the meantime, here's the progress, plans around testing so far. If folks
in the community can help test the release branch in the next couple of
days, it would be of great help!
 
 
   - Testing plan here: Stability tested by balaji running spark streaming,
   Correctness (Nishith testing on release branch, Udit to help with EMR setup)
   - Testing plan continued: Spark datasource MOR, .. Marker based
   rollback, upgrade-downgrade (vinoth is on this)
   - Testing plan : bootstrap (balaji tested for hive), udit has it working
   for Presto in EMR.
   - Testing plan: bulk_insert v2 (vinoth tested for correctness, file
   sizes. ), performance (tested on a cluster, microbenchmarks)
 
 
There will also be some blogs/docs to explain new stuff.
 
   - Blogs to explain new stuff: sort modes (vinoth), simple index
   (vinoth), bootstrap (balaji), multi delta-streamer (pratyaksh), Spark
   Streaming/DeltaStreamer continuous mode blog (balaji)
   - Docs: All missing docs will be added by Sudha.
 
 
 
 
On Tue, Aug 18, 2020 at 1:58 AM Bhavani Sudha 
wrote:
 
> Quick update on the RC.
>
> Found a build issue when building scala 2.12 and sent a PR for that -
> https://github.com/apache/hudi/pull/1976 . Working on resolving this in
> the
> release branch and updating RC. Will update soon.
>
> Thanks,
> Sudha
>
> On Fri, Aug 14, 2020 at 5:56 PM Vinoth Chandar  wrote:
>
> > Thanks Sudha! This is means master is now open for regular PRs. Thanks
> for
> > your patience, everyone.
> >
> > On Fri, Aug 14, 2020 at 3:51 PM Bhavani Sudha 
> > wrote:
> >
> > > Hello all,
> > >
> > > We have cut the release branch -
> > > https://github.com/apache/hudi/tree/release-0.6.0 . Since it is
> already
> > > Friday, we will be sending the release candidate early next week (after
> > > some testing).
> > >
> > > Happy Friday!
> > >
> > > Thanks,
> > > Sudha
> > >
> > > On Wed, Aug 12, 2020 at 3:56 PM vbal...@apache.org  >
> > > wrote:
> > >
> > > >
> > > > Hi Folks,
> > > > We are continuing to work on CI stabilization and will cut the
> release
> > > > once we stabilize the builds hopefully tonight/tomorrow.
> > > > Thanks,Balaji.V
> > > > On Tuesday, August 11, 2020, 09:15:05 PM PDT, Vinoth Chandar <
> > > > vin...@apache.org> wrote:
> > > >
> > > >  Hello all,
> > > >
> > > > Update on this. We have landed most of the blockers for the 0.6.0
> > release
> > > > and I am currently working on the last major blocker, HUDI-1013.
> > > > We are working through some unexpected CI flakiness. We hope to
> > stabilize
> > > > master, cut the RC, and then open up master for regular PR merges.
> > > > ETA for this is tomorrow night PST (Aug 12, PST).
> > > >
> > > > We will keep this thread posted!
> > > >
> > > > Thanks
> > > > Vinoth
> > > >
> > > > On Tue, Aug 4, 2020 at 9:47 PM Vinoth Chandar 
> > wrote:
> > > >
> > > > > Small correction:
> > > > >
> > > > > >> Vinoth working on code review, tests for PR 1876,
> > > > > This is landed!
> > > > >
> > > > >
> > > > > On Tue, Aug 4, 2020 at 9:44 PM Bhavani Sudha <
> > bhavanisud...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Hello all,
> > > > >>
> > > > >> We are targeting the end of this week to cut RC. Here is an update
> > of
> > > > >> where
> > > > >> we are at release blockers.
> > > > >>
> > > > >> 0.6.0 Release blocker status (board
> > > > >> <
> > > > >>
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=397&projectKey=HUDI&view=detail&selectedIssue=HUDI-69
> > > > >> >)
> > > > >> ,
> > > > >>
> > > > >>- Spark Datasource/MOR
> https://github.com/apache/hudi/pull/1848
> > > > needs
> > > > >> to
> > > > >>be tested by gary/balaji (About to land)
> > > > >>- Hive Sync restructuring (Review done, about to land)
> > > > >>- Bootstrap
> > > > >>  - Vinoth working on code review, tests for PR 1876,
> > > > >>  - then udit will rework PR 1702 (In Code review)
> > > > >>  - then we will review, land PR 1870, 1869
> > > > >>- Bulk insert V2 PR 1834, lower risk, independent PR, well
> tested
> > > > >> already
> > > > >>  - Dependent PR 1149 to be landed,
> > > > >>  - and modes to be respected in V2 impl as well (At risk)
> > > > >>- Upgrade Downgrade Hooks, PR 1858 : (In Code review)
> > > > >>- HUDI-1054- Marker list perf improvement, Udit has a PR out
> > > > >>- HUDI-115 : Overwrite with... ordering issue, Sudha has a PR
> > > nearing
> > > > >>landing
> > > > >>- HUDI-1098 : Marker file issue with non-existent files. (In
> Code
> > > > >> review)
> > > > >>- Spark Streaming + Async Compaction , test complete, code
> review
> > > > >>comments and l