Hi Felix, looks like the use case will benefit from virtual key feature in
this RFC

https://cwiki.apache.org/confluence/display/HUDI/RFC+-+21+%3A+Allow+HoodieRecordKey+to+be+Virtual

Once this is implemented, you don't have to create a separate key.

A rough thought: you mentioned 95% writes go to the same partition. Rather
than the record key, maybe consider improving on the partition field? to
have more even writes across partitions for eg?

On Sat, Nov 14, 2020 at 8:46 PM Kizhakkel Jose, Felix
<[email protected]> wrote:

> Hello All,
>
> I have asked generic questions regarding record key in slack channel, but
> I just want to consolidate everything regarding Record Key and the
> suggested best practices of Record Key construction to get better write
> performance.
>
> Table Type: COW
> Partition Path: Date
>
> My record uniqueness is derived from a combination of 4 fields:
>
>   1.  F1: Datetime (record’s origination datetime)
>   2.  F2: String       (11 char  long serial number)
>   3.  F3: UUID        (User Identifier)
>   4.  F4: String.       (12 CHAR statistic name)
>
> Note: My record is a nested document and some of the above fields are
> nested fields
>
> My Write Use Cases:
> 1. Writes to partitioned HUDI table every 15 minutes
>
>   1.  where 95% inserts and 5% updates,
>   2.  Also 95% write goes to same partition (current date) 5% write can
> span across multiple partitions
> 2. GDPR request to delete records from the table using User Identifier
> field (F3)
>
>
> Record Key Construction:
> Approach 1:
> Generate a UUID  from the concatenated String of all these 4 fields [eg:
> str(F1) + “_” + str(F2) + “_” + str(F3) + “_” + str(F4) ] and use that
> newly generated field as Record Key
>
> Approach 2:
> Generate a UUID  from the concatenated String of 3 fields except datetime
> field(F1) [eg: str(F2) + “_” + str(F3) + “_” + str(F4)] and prepend
> datetime field to the generated UUID and use that newly generated field as
> Record Key •F1_<uuid>
>
> Approach 3:
> Record Key as a composite key of all 4 fields (F1, F2, F3, F4)
>
> Which is the approach you will suggest? Could you please help me?
>
> Regards,
> Felix K Jose
>
>
>
>
>
>
>
>
>
>
> ________________________________
> The information contained in this message may be confidential and legally
> protected under applicable law. The message is intended solely for the
> addressee(s). If you are not the intended recipient, you are hereby
> notified that any use, forwarding, dissemination, or reproduction of this
> message is strictly prohibited and may be unlawful. If you are not the
> intended recipient, please contact the sender by return e-mail and destroy
> all copies of the original message.
>

Reply via email to