Re: [DISCUSS] Hudi 0.9.0 Release

2021-08-13 Thread Udit Mehrotra
Hi Community,

Here is a quick update on 0.9.0 release status. Over the last 10 days we
made significant progress on the release blockers previously mentioned in
the thread, thanks to all the owners. Here are the remaining blockers the
we are currently tracking:

   - [HUDI-2305] Add MARKERS.type and fix marker-based rollback
   - [HUDI-2268] Add upgrade and downgrade to and from 0.9.0
   release-blockers
   - [HUDI-2307] When using delete_partition with ds should not rely on the
   primary key
   - [HUDI-2151] Flipping defaults
   - [HUDI-1897] Deltastreamer source for AWS S3
   - [HUDI-2120] [DOC] Update docs about schema in flink sql configuration
   - [HUDI-2119] Ensure the rolled-back instance was previously synced to
   the Metadata Table when syncing a Rollback Instant.

We plan to resolve these soon and cut a RC by *tomorrow (August 14th, 2021)
end of day PST*. If you have any other blockers that you would like to
surface for Hudi 0.9.0, feel free to reach out.

Thanks,
Udit

On Fri, Aug 6, 2021 at 1:53 AM sagar sumit  wrote:

> Hi Udit, Vinoth
>
> End of next week sounds good. Apart from the issues listed, there is one
> more that we can take in this release:
> [HUDI-1897] DeltaStreamer Source for AWS S3
>
> It's under review and should be closed by early next week.
>
> Regards,
> Sagar
>
> On 2021/08/06 00:55:19, Raymond Xu  wrote:
> > +1 End of next week
> >
> > On Thu, Aug 5, 2021 at 3:06 PM Sivabalan  wrote:
> >
> > > Yeah, end of next week sounds good.
> > >
> > > Here are the status updates wrt patches I am involved.
> > >
> > >   Plan to get these in by early next week.
> > >- [HUDI-2208] Support Bulk Insert For Spark Sql (Owner: pengzhiwei)
> > >- [HUDI-2250] Bulk insert support for tables w/ primary key (Owner:
> > > Sivabalan)
> > >- [HUDI-1842] Spark Sql Support For The Exists Hoodie Table (Owner:
> > >pengzhiwei)
> > >- [HUDI-1138] Re-implement marker files via timeline server (Owner:
> > >Ethan Guo)
> > >- [HUDI-1129] Improving schema evolution support in hudi (Owner:
> > >Sivabalan)
> > >
> > >Mid next week:
> > >- [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration With
> Hudi
> > >(Owner: pengzhiwei)
> > >
> > >   Waiting for reviews. Will try to get it in by early next week. If we
> > > couldn't get this in, probably will skip this release.
> > >- [HUDI-1763] Fixing honoring of Ordering val in
> > >DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
> > >
> > >Removed from release blockers:
> > >- [HUDI-1887] Setting default value to false for enabling schema
> post
> > >processor (Owner: Sivabalan)
> > >- [HUDI-1850] Fixing read of a empty table but with failed write
> (Owner:
> > >Sivabalan)
> > >
> > >
> > > On Thu, Aug 5, 2021 at 11:17 AM Vinoth Chandar 
> wrote:
> > >
> > > > Any other thoughts? Love to lock this date down sooner than later.
> > > >
> > > > Thanks
> > > > Vinoth
> > > >
> > > > On Tue, Aug 3, 2021 at 11:35 PM Udit Mehrotra 
> wrote:
> > > >
> > > > > Agreed Vinoth. End of next week seems reasonable as a hard
> deadline for
> > > > > cutting the RC.
> > > > >
> > > > > If anyone thinks otherwise or needs more time, feel free to chime
> in.
> > > > >
> > > > > On Tue, Aug 3, 2021 at 8:10 PM Vinoth Chandar 
> > > wrote:
> > > > >
> > > > > > Thanks Udit! I propose we set end of next week as a hard
> deadline for
> > > > > > cutting the RC. Any thoughts?
> > > > > >
> > > > > > A good amount of progress is being made on these blockers, I
> think.
> > > > > >
> > > > > >
> > > > > > On Tue, Aug 3, 2021 at 5:13 PM Udit Mehrotra 
> > > > wrote:
> > > > > >
> > > > > > > Hi Community,
> > > > > > >
> > > > > > > As we draw close to doing Hudi 0.9.0 release, I am happy to
> share a
> > > > > > summary
> > > > > > > of the key features/improvements that would be going in the
> release
> > > > and
> > > > > > the
> > > > > > > current blockers for everyone's visibility.
> > > > > > >
> > > > > > > *Highlights*
> > > > > > >
> > > > > > >- [HUDI-1729] Asynchronous Hive sync and commits cleaning
> for
> > > > Flink
> > > > > > >writer
> > > > > > >- [HUDI-1738] Detect and emit deleted records for Flink MOR
> > > table
> > > > > > >streaming read
> > > > > > >- [HUDI-1867] Support streaming reads for Flink COW table
> > > > > > >- [HUDI-1908] Global index for flink writer
> > > > > > >- [HUDI-1788] Support Insert Overwrite with Flink Writer
> > > > > > >- [HUDI-2209] Bulk insert for flink writer
> > > > > > >- [HUDI-1591] Support querying using non-globbed paths for
> Hudi
> > > > > Spark
> > > > > > >DataSource queries
> > > > > > >- [HUDI-1591] Partition pruning support for read optimized
> > > queries
> > > > > via
> > > > > > >Hudi Spark DataSource
> > > > > > >- [HUDI-1415] Register Hudi Table as a Spark DataSource
> Table
> > > with
> > > > > > >metastore. Queries via Spark SQL will be routed through Hudi
> > > > > > DataSource

Re: How to read hudi files with Mapreduce?

2021-08-13 Thread Vinoth Chandar
Hi Jian,

We have a hoodie-hadoop-mr package with some InputFormat. You can try using
HoodieParquetInputFormat to read from a MR job.
I have only tested with Hive this way myself. So wondering if anyone else
here has real experience trying with MR itself.

Thanks
Vinoth

On Wed, Aug 11, 2021 at 3:29 AM Jian Feng  wrote:

> Hi all,  anyone can give me a sample?
> --
>
> FengJian
>
> Data Infrastructure Team
>
> Mobile +65 90388153
>
> Address 5 Science Park Drive, Shopee Building, Singapore 118265
>