[ 
https://issues.apache.org/jira/browse/HUDI-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Feng updated HUDI-4882:
----------------------------
    Description: 
For example, we have 2 sources,  one target table
* source1's fields: *id, ts, name*
* source2's fields:* id, ts, price*
* target tables's fields:* id,ts,name, price*

ts is the precombine field;


in the 1st batch, we got two records from both sources:
   Source1:
       id      ts      name   
       1       1       name_1 
   Source 2:
       id      ts         price
       1       3          price_3
so the records in the target table should be:
      id      ts         name      price
       1       3          name_1  price_3

let's say in the 2nd batch, we got one event from the source1:
Source1:
       id      ts      name   
       1       2       name_2

name_2 won't be updated to target table, since its ts value is smaller than ts 
value in the target table.

This feature will allow users to perform partial updates across 
sub-tables/sources by determining the state of a set of columns in a row based 
on an ordering/precombine column.

As such, a table can have MULTIPLE ordering fields.

This use case is suitable for wide Hudi tables that are created from smaller 
sub-tables, where each of its sub-tables has its own precombine column, and 
where its records could be upserted out of order.
 !image-2022-09-20-22-46-52-907.png! 



  was:
For example, we have 2 sources,  one target table
* source1's fields: *id, ts, name*
* source2's fields:* id, ts, price*
* target tables's fields:* id,ts,name, price*
ts is the precombine field;

in the 1st batch, we got two records from both sources:
   Source1:
       id      ts      name   
       1       1       name_1 
   Source 2:
       id      ts         price
       1       3          price_3
so the records in the target table should be:
      id      ts         name      price
       1       3          name_1  price_3

let's say in the 2nd batch, we got one event from the source1:
Source1:
       id      ts      name   
       1       2       name_2

name_2 won't be updated to target table, since its ts value is smaller than ts 
value in the target table.

This feature will allow users to perform partial updates across 
sub-tables/sources by determining the state of a set of columns in a row based 
on an ordering/precombine column.

As such, a table can have MULTIPLE ordering fields.

This use case is suitable for wide Hudi tables that are created from smaller 
sub-tables, where each of its sub-tables has its own precombine column, and 
where its records could be upserted out of order.
 !image-2022-09-20-22-46-52-907.png! 




> Multiple ordering fields for partial update to handle out-of-order events
> -------------------------------------------------------------------------
>
>                 Key: HUDI-4882
>                 URL: https://issues.apache.org/jira/browse/HUDI-4882
>             Project: Apache Hudi
>          Issue Type: New Feature
>            Reporter: Jian Feng
>            Priority: Major
>         Attachments: image-2022-09-20-22-42-19-445.png, 
> image-2022-09-20-22-46-52-907.png
>
>
> For example, we have 2 sources,  one target table
> * source1's fields: *id, ts, name*
> * source2's fields:* id, ts, price*
> * target tables's fields:* id,ts,name, price*
> ts is the precombine field;
> in the 1st batch, we got two records from both sources:
>    Source1:
>        id      ts      name   
>        1       1       name_1 
>    Source 2:
>        id      ts         price
>        1       3          price_3
> so the records in the target table should be:
>       id      ts         name      price
>        1       3          name_1  price_3
> let's say in the 2nd batch, we got one event from the source1:
> Source1:
>        id      ts      name   
>        1       2       name_2
> name_2 won't be updated to target table, since its ts value is smaller than 
> ts value in the target table.
> This feature will allow users to perform partial updates across 
> sub-tables/sources by determining the state of a set of columns in a row 
> based on an ordering/precombine column.
> As such, a table can have MULTIPLE ordering fields.
> This use case is suitable for wide Hudi tables that are created from smaller 
> sub-tables, where each of its sub-tables has its own precombine column, and 
> where its records could be upserted out of order.
>  !image-2022-09-20-22-46-52-907.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to