Re: [DISCUSS] Planning for Releases 0.6.1 and 0.7.0

nishith agarwal Thu, 24 Sep 2020 15:28:01 -0700

Yes, we have some ideas around schema evolution and have discussed with
Balaji before as well. I'm going to put these thoughts down and share it on
the cWiki for all of us to jam. Realistically, I don't think we can hit in
0.7.0. We already have a pretty strong list of items for 0.7.0.


Spark 3 SQL syntax like MERGE will definitely boost usability!

Thanks,
Nishith

On Thu, Sep 24, 2020 at 3:22 PM Vinoth Chandar <[email protected]> wrote:

> On schema evolution, Nishith and Balaji were both thinking about this. May
> be there is a proposal in works?
> I would guess we will not be able to hit it in 0.7.0 though. Maybe by the
> end of year/0.8.0?
>
> Tanu, thanks for the kind words! def, if we pull together, we will reach
> there sooner. Looking forward to more contributions! :)
>
> >We were actually thinking of moving to Spark 3.0 but thought it’s too
> early with 0.6 release. Is 0.6 not fully tested with Spark 3.0 ?
> That's correct. There is a PR already open for this. We expect this to be
> fixed in 0.6.1 shortly and we will unlock spark 3.0 support
>
> 0.7.0 will bring spark 3 SQL syntax like MERGE etc.  (Other systems that
> have had this, either had an unfair head start or built ahead with spark 3
> in mind. :))
> We will close this gap down.
>
> On Wed, Sep 23, 2020 at 6:25 PM Raymond Xu <[email protected]>
> wrote:
>
> > +1 on the full schema evolution support. May I know which ticket this is
> > related to? thanks.
> >
> > On Wed, Sep 23, 2020 at 5:20 AM leesf <[email protected]> wrote:
> >
> > > Thanks Vinoth, also we would consider support full schema
> evolution(such
> > as
> > >
> > > drop some fields) of hudi in 0.7.0, since right now hudi follows avro
> > >
> > > schema compatibility
> > >
> > >
> > >
> > > tanu dua <[email protected]> 于2020年9月23日周三 下午12:38写道：
> > >
> > >
> > >
> > > > Thanks Vinoth. These are really exciting items and hats off to you
> and
> > > team
> > >
> > > > in pushing the releases swiftly and improving the framework all the
> > > time. I
> > >
> > > > hope someday I will start contributing once I will get free from my
> > major
> > >
> > > > deliverables and have more understanding the nitty gritty details of
> > > Hudi.
> > >
> > > >
> > >
> > > > You have mentioned Spark3.0 support in next release. We were actually
> > >
> > > > thinking of moving to Spark 3.0 but thought it’s too early with 0.6
> > >
> > > > release. Is 0.6 not fully tested with Spark 3.0 ?
> > >
> > > >
> > >
> > > >
> > >
> > > > On Wed, 23 Sep 2020 at 8:25 AM, Vinoth Chandar <[email protected]>
> > > wrote:
> > >
> > > >
> > >
> > > > > Hello all,
> > >
> > > > >
> > >
> > > > >
> > >
> > > > >
> > >
> > > > > Pursuant to our conversation around release planning, I am happy to
> > > share
> > >
> > > > >
> > >
> > > > > the initial set of proposals for the next minor/major releases
> (minor
> > >
> > > > >
> > >
> > > > > release ofc can go out based on time)
> > >
> > > > >
> > >
> > > > >
> > >
> > > > >
> > >
> > > > > *Next Minor version 0.6.1 (with stuff that did not make it to
> > 0.6.0..)
> > > *
> > >
> > > > >
> > >
> > > > > Flink/Writer common refactoring for Flink
> > >
> > > > >
> > >
> > > > > Small file handling support w/o caching
> > >
> > > > >
> > >
> > > > > Spark3 Support
> > >
> > > > >
> > >
> > > > > Remaining bootstrap items
> > >
> > > > >
> > >
> > > > > Completing bulk_insertV2 (sort mode, de-dup etc)
> > >
> > > > >
> > >
> > > > > Full list here :
> > >
> > > > >
> > >
> > > > > https://issues.apache.org/jira/projects/HUDI/versions/12348168
> > >
> > > > >
> > >
> > > > > <https://issues.apache.org/jira/projects/HUDI/versions/12348168>
> > >
> > > > >
> > >
> > > > >
> > >
> > > > >
> > >
> > > > > *0.7.0 with major new features *
> > >
> > > > >
> > >
> > > > > RFC-15: metadata, range index (w/ spark support), bloom index
> > > (eliminate
> > >
> > > > >
> > >
> > > > > file listing, query pruning, improve bloom index perf)
> > >
> > > > >
> > >
> > > > > RFC-08: Record Index (to solve global index scalability/perf)
> > >
> > > > >
> > >
> > > > > RFC-18/19: Clustering/Insert overwrite
> > >
> > > > >
> > >
> > > > > Spark 3 based datasource rewrite (structured streaming sink/source,
> > >
> > > > >
> > >
> > > > > DELETE/MERGE)
> > >
> > > > >
> > >
> > > > > Incremental Query on logs (Hive, Spark)
> > >
> > > > >
> > >
> > > > > Parallel writing support
> > >
> > > > >
> > >
> > > > > Redesign of marker files for S3
> > >
> > > > >
> > >
> > > > > Stretch: ORC, PrestoSQL Support
> > >
> > > > >
> > >
> > > > >
> > >
> > > > >
> > >
> > > > > Full list here :
> > >
> > > > >
> > >
> > > > > https://issues.apache.org/jira/projects/HUDI/versions/12348721
> > >
> > > > >
> > >
> > > > >
> > >
> > > > >
> > >
> > > > > Please chime in with your thoughts. If you would like to commit to
> > >
> > > > >
> > >
> > > > > contributing a feature towards a release, please do so by marking
> > *`Fix
> > >
> > > > >
> > >
> > > > > Version/s`* field with that release number.
> > >
> > > > >
> > >
> > > > >
> > >
> > > > >
> > >
> > > > > Thanks
> > >
> > > > >
> > >
> > > > > Vinoth
> > >
> > > > >
> > >
> > > > >
> > >
> > > >
> > >
> > >
> >
>

Re: [DISCUSS] Planning for Releases 0.6.1 and 0.7.0

Reply via email to