> In another words, we are generalizing this so hudi feels more like
> MySQL and not HBase/Cassandra (key value store). Thats the direction
> we are approaching.

wow this is amazing. I haven't found yet RFC about this, nor ready to
test PR.

This answer my initial question: with the secondary indexes options
comming, the hudi key shall be a primary key (if exists). There is no
reason to choose anything else.

On Wed Nov 3, 2021 at 9:03 PM CET, Vinoth Chandar wrote:
> Hi.
>
> With the indexing approach we are taking, you should be able to add
> secondary indexes on any column. not just the key.
> In another words, we are generalizing this so hudi feels more like MySQL
> and not HBase/Cassandra (key value store). Thats the direction we are
> approaching.
>
> love to hear more feedback.
>
> On Tue, Nov 2, 2021 at 2:29 AM Nicolas Paris <[email protected]>
> wrote:
>
> > for example does the move of blooms into hfiles (0.10.0 feature) makes
> > unique bloom keys mandatory ?
> >
> >
> >
> > On Thu Oct 28, 2021 at 7:00 PM CEST, Nicolas Paris wrote:
> > >
> > > > Are you asking if there are advantages to allowing duplicates or not
> > having keys in your table?
> > > it's all about allowing duplicates
> > >
> > > use case is say an Order table and choosing key = customer_id
> > > then being able to do indexed delete without need of prescanning the
> > > dataset
> > >
> > > I wonder if there will be trouble I am unaware of with such trick
> > >
> > > On Thu Oct 28, 2021 at 2:33 PM CEST, Vinoth Chandar wrote:
> > > > Hi,
> > > >
> > > > Are you asking if there are advantages to allowing duplicates or not
> > > > having
> > > > keys in your table?
> > > >
> > > > Having keys, helps with othe practical scenarios, in addition to what
> > > > you
> > > > called out.
> > > > e.g: Oftentimes, you would want to backfill an insert-only table and
> > you
> > > > don't want to introduce duplicates when doing so.
> > > >
> > > > Thanks
> > > > Vinoth
> > > >
> > > > On Tue, Oct 26, 2021 at 1:37 AM Nicolas Paris <
> > [email protected]>
> > > > wrote:
> > > >
> > > > > Hi devs,
> > > > >
> > > > > AFAIK, hudi has been designed to have primary keys in the hudi's key.
> > > > > However it is possible to also choose a non unique field. I have
> > listed
> > > > > several trouble with such design:
> > > > >
> > > > > Non unique key yield to :
> > > > > - cannot delete / update a unique record
> > > > > - cannot apply primary key for new sql tables feature
> > > > >
> > > > > Is there other downsides to choose a non unique key you have in mind
> > ?
> > > > >
> > > > > In my case, having user_id as a hudi key will help to apply deletion
> > on
> > > > > the user level in any user table. The table are insert only, so the
> > > > > drawbacks listed above do not really apply. In case of error in the
> > > > > tables I have several options:
> > > > >
> > > > > - rollback to a previous commit
> > > > > - read partition/filter overwrite partition
> > > > >
> > > > > Thanks
> > > > >
> >
> >

Reply via email to