Hi Alexey, Thanks for the update!
So for maximum performance when writing to Hudi from a low-level (DataStream, not Table) Flink workflow, we’d be creating RowData records? — Ken > On Sep 27, 2022, at 2:08 PM, Alexey Kudinkin <[email protected]> wrote: > > Hello, everyone! > > As you might be aware, community has been very busy at work on RFC-46 > aiming to bring long-awaited cutting edge level of performance to Hudi by > avoiding using Avro as an intermediate representation, instead relying on > individual engines to host data in their own formats (InternalRow for > Spark, RowData for Flink, etc) > > We wanted to share an update in terms of where we are and what are the next > steps from here: > > - We're very close to completing the work and are already preparing to > be landing complete implementation of the Phase 1 of the RFC-46 currently > being developed in a feature branch > <https://github.com/apache/hudi/tree/release-feature-rfc46> > - To be able to successfully merge the change of such scale, we will > have to do a *code freeze* for the master branch barring any changes to > land before we're able to merge the feature-branch. > - To make sure that this activity doesn't interrupt the 0.12.1 release > that is currently in progress we're tentatively planning to schedule this > code-freeze *after* successful finalization of the release process with > RC branch being cut and validated for release. As of now, provided RC > candidate will be cut tomorrow on 09/28 we're aiming to schedule a merge > attempt somewhere mid to late next week. > - We will follow-up on this thread separately at least *24h* before the > scheduled code-freeze with an exact date and time frame for it. Stay tuned. > > > Alexey, on behalf of the RFC-46 group -------------------------- Ken Krugler http://www.scaleunlimited.com Custom big data solutions Flink, Pinot, Solr, Elasticsearch
