Re: HUDI Table Primary Key - UUID or Custom For Better Performance

tanu dua Thu, 15 Oct 2020 20:22:24 -0700

read query pattern will be (partition key + primary key minus timestamp)
where my primary key is domain keys + timestamp.


Read Write queries are as per dataset but mostly all the tables are read
and write frequently and equally

Read will be mostly done by providing the partitions and not by blanket
query.

If we have to choose between read and write I will choose write but I want
to stick only with COW table.

Please let me know if you need more information.


On Thu, 15 Oct 2020 at 5:48 PM, Sivabalan <[email protected]> wrote:

> Can you give us a sense of how your read workload looks like? Depending on
> that read perf could vary.
>
> On Thu, Oct 15, 2020 at 4:06 AM Tanuj <[email protected]> wrote:
>
> > Hi all,
> > We don't have an "UPDATE" use case and all ingested rows will be "INSERT"
> > so what is the best way to define PRIMARY key. As of now we have designed
> > primary key as per domain object with create_date which is -
> > <domain_object_key_1>,<domain_object_key_2>,<create_date>
> >
> > Since its always an INSERT for us , I can potentially use UUID as well .
> >
> > We use keys for Bloom Index in HUDI so just wanted to know if I get a
> > better performance in writing if I will have the UUID vs composite domain
> > keys.
> >
> > I believe read is not impacted as per the Primary Key as its not being
> > considered ?
> >
> > Please suggest
> >
> >
>
> --
> Regards,
> -Sivabalan
>

Re: HUDI Table Primary Key - UUID or Custom For Better Performance

Reply via email to