In my experience, you need to follow some rules on evolving and keep the data backwards compatible. Or the only other option is to rewrite the entire dataset :), which is very expensive.
If you have some pointers to learn more about any approach you are suggesting, happy to read up. On Wed, Jan 1, 2020 at 10:26 PM Pratyaksh Sharma <[email protected]> wrote: > Hi Vinoth, > > As you explained above and as per what is mentioned in this FAQ ( > > https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-What'sHudi'sschemaevolutionstory > ), > Hudi is able to maintain schema evolution only if the schema is *backwards > compatible*. What about the case when it is backwards incompatible? This > might be the case when for some reason you are unable to enforce things > like not deleting fields or not change the order. Ideally we should be full > proof and be able to support schema evolution in every case possible. In > such a case, creating a Uber schema can be useful. WDYT? > > On Wed, Jan 1, 2020 at 12:49 AM Vinoth Chandar <[email protected]> wrote: > > > Hi Syed, > > > > Typically, I have been the Confluent/avro schema registry used as a the > > source of truth and Hive schema is just a translation. Thats how the > > hudi-hive sync also works.. > > Have you considered making fields optional in the avro schema so that > even > > if the source data does not have few of them, there will be nulls.. > > In general, the two places I have dealt with this, all made it works > using > > the schema evolution rules avro supports.. and enforcing things like not > > deleting fields, not changing order etc. > > > > Hope that atleast helps a bit > > > > thanks > > vinoth > > > > On Sun, Dec 29, 2019 at 11:55 PM Syed Abdul Kather <[email protected]> > > wrote: > > > > > Hi Team, > > > > > > We have pull data from Kafka generated by Debezium. The schema > maintained > > > in the schema registry by confluent framework during the population of > > > data. > > > > > > *Problem Statement Here: * > > > > > > All the addition/deletion of columns is maintained in schema registry. > > > During running the Hudi pipeline, We have custom schema registry that > > > pulls the latest schema from the schema registry as well as from hive > > > metastore and we create a uber schema (so that missing the columns from > > the > > > schema registry will be pulled from hive metastore) But is there any > > better > > > approach to solve this problem?. > > > > > > > > > > > > > > > Thanks and Regards, > > > S SYED ABDUL KATHER > > > > > >
