Re: Regards to Uber Schema Registry ( Hive Schema + Schema Registry )

Vinoth Chandar Sat, 04 Jan 2020 12:30:48 -0800

In my experience, you need to follow some rules on evolving and keep the
data backwards compatible. Or the only other option is to rewrite the
entire dataset :), which is very expensive.


If you have some pointers to learn more about any approach you are
suggesting, happy to read up.

On Wed, Jan 1, 2020 at 10:26 PM Pratyaksh Sharma <[email protected]>
wrote:

> Hi Vinoth,
>
> As you explained above and as per what is mentioned in this FAQ (
>
> https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-What'sHudi'sschemaevolutionstory
> ),
> Hudi is able to maintain schema evolution only if the schema is *backwards
> compatible*. What about the case when it is backwards incompatible? This
> might be the case when for some reason you are unable to enforce things
> like not deleting fields or not change the order. Ideally we should be full
> proof and be able to support schema evolution in every case possible. In
> such a case, creating a Uber schema can be useful. WDYT?
>
> On Wed, Jan 1, 2020 at 12:49 AM Vinoth Chandar <[email protected]> wrote:
>
> > Hi Syed,
> >
> > Typically, I have been the Confluent/avro schema registry used as a the
> > source of truth and Hive schema is just a translation. Thats how the
> > hudi-hive sync also works..
> > Have you considered making fields optional in the avro schema so that
> even
> > if the source data does not have few of them, there will be nulls..
> > In general, the two places I have dealt with this, all made it works
> using
> > the schema evolution rules avro supports.. and enforcing things like not
> > deleting fields, not changing order etc.
> >
> > Hope that atleast helps a bit
> >
> > thanks
> > vinoth
> >
> > On Sun, Dec 29, 2019 at 11:55 PM Syed Abdul Kather <[email protected]>
> > wrote:
> >
> > > Hi Team,
> > >
> > > We have pull data from Kafka generated by Debezium. The schema
> maintained
> > > in the schema registry by confluent framework during the population of
> > > data.
> > >
> > > *Problem Statement Here: *
> > >
> > > All the addition/deletion of columns is maintained in schema registry.
> > >  During running the Hudi pipeline, We have custom schema registry that
> > > pulls the latest schema from the schema registry as well as from hive
> > > metastore and we create a uber schema (so that missing the columns from
> > the
> > > schema registry will be pulled from hive metastore) But is there any
> > better
> > > approach to solve this problem?.
> > >
> > >
> > >
> > >
> > >             Thanks and Regards,
> > >         S SYED ABDUL KATHER
> > >
> >
>

Re: Regards to Uber Schema Registry ( Hive Schema + Schema Registry )

Reply via email to