for example does the move of blooms into hfiles (0.10.0 feature) makes
unique bloom keys mandatory ?



On Thu Oct 28, 2021 at 7:00 PM CEST, Nicolas Paris wrote:
>
> > Are you asking if there are advantages to allowing duplicates or not having 
> > keys in your table?
> it's all about allowing duplicates
>
> use case is say an Order table and choosing key = customer_id
> then being able to do indexed delete without need of prescanning the
> dataset
>
> I wonder if there will be trouble I am unaware of with such trick
>
> On Thu Oct 28, 2021 at 2:33 PM CEST, Vinoth Chandar wrote:
> > Hi,
> >
> > Are you asking if there are advantages to allowing duplicates or not
> > having
> > keys in your table?
> >
> > Having keys, helps with othe practical scenarios, in addition to what
> > you
> > called out.
> > e.g: Oftentimes, you would want to backfill an insert-only table and you
> > don't want to introduce duplicates when doing so.
> >
> > Thanks
> > Vinoth
> >
> > On Tue, Oct 26, 2021 at 1:37 AM Nicolas Paris <[email protected]>
> > wrote:
> >
> > > Hi devs,
> > >
> > > AFAIK, hudi has been designed to have primary keys in the hudi's key.
> > > However it is possible to also choose a non unique field. I have listed
> > > several trouble with such design:
> > >
> > > Non unique key yield to :
> > > - cannot delete / update a unique record
> > > - cannot apply primary key for new sql tables feature
> > >
> > > Is there other downsides to choose a non unique key you have in mind ?
> > >
> > > In my case, having user_id as a hudi key will help to apply deletion on
> > > the user level in any user table. The table are insert only, so the
> > > drawbacks listed above do not really apply. In case of error in the
> > > tables I have several options:
> > >
> > > - rollback to a previous commit
> > > - read partition/filter overwrite partition
> > >
> > > Thanks
> > >

Reply via email to