Thanks for pointing me to the RFC! When using Spark to write a table, we need to launch several Spark jobs, e.g. to search index and tag locations, workload profiling, etc. Now RFC-13 aims to encapsulate all these in a single Flink DAG, right? Do we have plans about how to achieve this?
On Tue, Sep 29, 2020 at 9:40 AM 王** <[email protected]> wrote: > Hi Rui > Thanks for asking, the design for flink integeration can be found here: > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=141724520 > please ping me if you have any questions. > > > At 2020-09-28 20:43:22, "Rui Li" <[email protected]> wrote: > >Hello, > > > >Very excited to see the on-going efforts for Flink integration. I wonder > >whether there's a design doc for this feature? I would like to learn more > >and hopefully to make some contributions. > > > >On Fri, Sep 25, 2020 at 6:27 AM nishith agarwal <[email protected]> > wrote: > > > >> Yes, we have some ideas around schema evolution and have discussed with > >> Balaji before as well. I'm going to put these thoughts down and share > it on > >> the cWiki for all of us to jam. Realistically, I don't think we can hit > in > >> 0.7.0. We already have a pretty strong list of items for 0.7.0. > >> > >> Spark 3 SQL syntax like MERGE will definitely boost usability! > >> > >> Thanks, > >> Nishith > >> > >> On Thu, Sep 24, 2020 at 3:22 PM Vinoth Chandar <[email protected]> > wrote: > >> > >> > On schema evolution, Nishith and Balaji were both thinking about this. > >> May > >> > be there is a proposal in works? > >> > I would guess we will not be able to hit it in 0.7.0 though. Maybe by > the > >> > end of year/0.8.0? > >> > > >> > Tanu, thanks for the kind words! def, if we pull together, we will > reach > >> > there sooner. Looking forward to more contributions! :) > >> > > >> > >We were actually thinking of moving to Spark 3.0 but thought it’s too > >> > early with 0.6 release. Is 0.6 not fully tested with Spark 3.0 ? > >> > That's correct. There is a PR already open for this. We expect this > to be > >> > fixed in 0.6.1 shortly and we will unlock spark 3.0 support > >> > > >> > 0.7.0 will bring spark 3 SQL syntax like MERGE etc. (Other systems > that > >> > have had this, either had an unfair head start or built ahead with > spark > >> 3 > >> > in mind. :)) > >> > We will close this gap down. > >> > > >> > On Wed, Sep 23, 2020 at 6:25 PM Raymond Xu < > [email protected]> > >> > wrote: > >> > > >> > > +1 on the full schema evolution support. May I know which ticket > this > >> is > >> > > related to? thanks. > >> > > > >> > > On Wed, Sep 23, 2020 at 5:20 AM leesf <[email protected]> wrote: > >> > > > >> > > > Thanks Vinoth, also we would consider support full schema > >> > evolution(such > >> > > as > >> > > > > >> > > > drop some fields) of hudi in 0.7.0, since right now hudi follows > avro > >> > > > > >> > > > schema compatibility > >> > > > > >> > > > > >> > > > > >> > > > tanu dua <[email protected]> 于2020年9月23日周三 下午12:38写道: > >> > > > > >> > > > > >> > > > > >> > > > > Thanks Vinoth. These are really exciting items and hats off to > you > >> > and > >> > > > team > >> > > > > >> > > > > in pushing the releases swiftly and improving the framework all > the > >> > > > time. I > >> > > > > >> > > > > hope someday I will start contributing once I will get free > from my > >> > > major > >> > > > > >> > > > > deliverables and have more understanding the nitty gritty > details > >> of > >> > > > Hudi. > >> > > > > >> > > > > > >> > > > > >> > > > > You have mentioned Spark3.0 support in next release. We were > >> actually > >> > > > > >> > > > > thinking of moving to Spark 3.0 but thought it’s too early with > 0.6 > >> > > > > >> > > > > release. Is 0.6 not fully tested with Spark 3.0 ? > >> > > > > >> > > > > > >> > > > > >> > > > > > >> > > > > >> > > > > On Wed, 23 Sep 2020 at 8:25 AM, Vinoth Chandar < > [email protected]> > >> > > > wrote: > >> > > > > >> > > > > > >> > > > > >> > > > > > Hello all, > >> > > > > >> > > > > > > >> > > > > >> > > > > > > >> > > > > >> > > > > > > >> > > > > >> > > > > > Pursuant to our conversation around release planning, I am > happy > >> to > >> > > > share > >> > > > > >> > > > > > > >> > > > > >> > > > > > the initial set of proposals for the next minor/major releases > >> > (minor > >> > > > > >> > > > > > > >> > > > > >> > > > > > release ofc can go out based on time) > >> > > > > >> > > > > > > >> > > > > >> > > > > > > >> > > > > >> > > > > > > >> > > > > >> > > > > > *Next Minor version 0.6.1 (with stuff that did not make it to > >> > > 0.6.0..) > >> > > > * > >> > > > > >> > > > > > > >> > > > > >> > > > > > Flink/Writer common refactoring for Flink > >> > > > > >> > > > > > > >> > > > > >> > > > > > Small file handling support w/o caching > >> > > > > >> > > > > > > >> > > > > >> > > > > > Spark3 Support > >> > > > > >> > > > > > > >> > > > > >> > > > > > Remaining bootstrap items > >> > > > > >> > > > > > > >> > > > > >> > > > > > Completing bulk_insertV2 (sort mode, de-dup etc) > >> > > > > >> > > > > > > >> > > > > >> > > > > > Full list here : > >> > > > > >> > > > > > > >> > > > > >> > > > > > > https://issues.apache.org/jira/projects/HUDI/versions/12348168 > >> > > > > >> > > > > > > >> > > > > >> > > > > > < > https://issues.apache.org/jira/projects/HUDI/versions/12348168> > >> > > > > >> > > > > > > >> > > > > >> > > > > > > >> > > > > >> > > > > > > >> > > > > >> > > > > > *0.7.0 with major new features * > >> > > > > >> > > > > > > >> > > > > >> > > > > > RFC-15: metadata, range index (w/ spark support), bloom index > >> > > > (eliminate > >> > > > > >> > > > > > > >> > > > > >> > > > > > file listing, query pruning, improve bloom index perf) > >> > > > > >> > > > > > > >> > > > > >> > > > > > RFC-08: Record Index (to solve global index scalability/perf) > >> > > > > >> > > > > > > >> > > > > >> > > > > > RFC-18/19: Clustering/Insert overwrite > >> > > > > >> > > > > > > >> > > > > >> > > > > > Spark 3 based datasource rewrite (structured streaming > >> sink/source, > >> > > > > >> > > > > > > >> > > > > >> > > > > > DELETE/MERGE) > >> > > > > >> > > > > > > >> > > > > >> > > > > > Incremental Query on logs (Hive, Spark) > >> > > > > >> > > > > > > >> > > > > >> > > > > > Parallel writing support > >> > > > > >> > > > > > > >> > > > > >> > > > > > Redesign of marker files for S3 > >> > > > > >> > > > > > > >> > > > > >> > > > > > Stretch: ORC, PrestoSQL Support > >> > > > > >> > > > > > > >> > > > > >> > > > > > > >> > > > > >> > > > > > > >> > > > > >> > > > > > Full list here : > >> > > > > >> > > > > > > >> > > > > >> > > > > > > https://issues.apache.org/jira/projects/HUDI/versions/12348721 > >> > > > > >> > > > > > > >> > > > > >> > > > > > > >> > > > > >> > > > > > > >> > > > > >> > > > > > Please chime in with your thoughts. If you would like to > commit > >> to > >> > > > > >> > > > > > > >> > > > > >> > > > > > contributing a feature towards a release, please do so by > marking > >> > > *`Fix > >> > > > > >> > > > > > > >> > > > > >> > > > > > Version/s`* field with that release number. > >> > > > > >> > > > > > > >> > > > > >> > > > > > > >> > > > > >> > > > > > > >> > > > > >> > > > > > Thanks > >> > > > > >> > > > > > > >> > > > > >> > > > > > Vinoth > >> > > > > >> > > > > > > >> > > > > >> > > > > > > >> > > > > >> > > > > > >> > > > > >> > > > > >> > > > >> > > >> > > > > > >-- > >Cheers, > >Rui Li > -- Best regards! Rui Li
