> Are you asking if there are advantages to allowing duplicates or not having > keys in your table? it's all about allowing duplicates
use case is say an Order table and choosing key = customer_id then being able to do indexed delete without need of prescanning the dataset I wonder if there will be trouble I am unaware of with such trick On Thu Oct 28, 2021 at 2:33 PM CEST, Vinoth Chandar wrote: > Hi, > > Are you asking if there are advantages to allowing duplicates or not > having > keys in your table? > > Having keys, helps with othe practical scenarios, in addition to what > you > called out. > e.g: Oftentimes, you would want to backfill an insert-only table and you > don't want to introduce duplicates when doing so. > > Thanks > Vinoth > > On Tue, Oct 26, 2021 at 1:37 AM Nicolas Paris <[email protected]> > wrote: > > > Hi devs, > > > > AFAIK, hudi has been designed to have primary keys in the hudi's key. > > However it is possible to also choose a non unique field. I have listed > > several trouble with such design: > > > > Non unique key yield to : > > - cannot delete / update a unique record > > - cannot apply primary key for new sql tables feature > > > > Is there other downsides to choose a non unique key you have in mind ? > > > > In my case, having user_id as a hudi key will help to apply deletion on > > the user level in any user table. The table are insert only, so the > > drawbacks listed above do not really apply. In case of error in the > > tables I have several options: > > > > - rollback to a previous commit > > - read partition/filter overwrite partition > > > > Thanks > >
