Hi. With the indexing approach we are taking, you should be able to add secondary indexes on any column. not just the key. In another words, we are generalizing this so hudi feels more like MySQL and not HBase/Cassandra (key value store). Thats the direction we are approaching.
love to hear more feedback. On Tue, Nov 2, 2021 at 2:29 AM Nicolas Paris <nicolas.pa...@riseup.net> wrote: > for example does the move of blooms into hfiles (0.10.0 feature) makes > unique bloom keys mandatory ? > > > > On Thu Oct 28, 2021 at 7:00 PM CEST, Nicolas Paris wrote: > > > > > Are you asking if there are advantages to allowing duplicates or not > having keys in your table? > > it's all about allowing duplicates > > > > use case is say an Order table and choosing key = customer_id > > then being able to do indexed delete without need of prescanning the > > dataset > > > > I wonder if there will be trouble I am unaware of with such trick > > > > On Thu Oct 28, 2021 at 2:33 PM CEST, Vinoth Chandar wrote: > > > Hi, > > > > > > Are you asking if there are advantages to allowing duplicates or not > > > having > > > keys in your table? > > > > > > Having keys, helps with othe practical scenarios, in addition to what > > > you > > > called out. > > > e.g: Oftentimes, you would want to backfill an insert-only table and > you > > > don't want to introduce duplicates when doing so. > > > > > > Thanks > > > Vinoth > > > > > > On Tue, Oct 26, 2021 at 1:37 AM Nicolas Paris < > nicolas.pa...@riseup.net> > > > wrote: > > > > > > > Hi devs, > > > > > > > > AFAIK, hudi has been designed to have primary keys in the hudi's key. > > > > However it is possible to also choose a non unique field. I have > listed > > > > several trouble with such design: > > > > > > > > Non unique key yield to : > > > > - cannot delete / update a unique record > > > > - cannot apply primary key for new sql tables feature > > > > > > > > Is there other downsides to choose a non unique key you have in mind > ? > > > > > > > > In my case, having user_id as a hudi key will help to apply deletion > on > > > > the user level in any user table. The table are insert only, so the > > > > drawbacks listed above do not really apply. In case of error in the > > > > tables I have several options: > > > > > > > > - rollback to a previous commit > > > > - read partition/filter overwrite partition > > > > > > > > Thanks > > > > > >