[jira] [Commented] (KUDU-3353) Support setnx semantic on column

Yingchun Lai (Jira) Wed, 20 Jul 2022 03:34:11 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568976#comment-17568976
 ]


Yingchun Lai commented on KUDU-3353:
------------------------------------

Let me clarify some use cases:

A user profile table in Kudu has a column "first_login_ts", it represent the 
first login time to the website. The data in the table is upsert by user event 
log, the log contains user's id, some attributes, and "first_login_ts". The 
first_login_ts is filled by the log produced time, that means for a specified 
user, his/her event logs have a different (higher and higher) "first_login_ts", 
but only the first one could be set, and the following logs should not update 
it.

 

The updated design:

1. Add a column attribute to define a column as IMMUTABLE, means the column 
cell value can not be updated after it's been written during inserting the row.

2. Use UPDATE_IGNORE and add UPSERT_IGNORE, for UPDATE and UPSERT ops but 
ignore update-errors on IMMUTABLE columns.

> Support setnx semantic on column
> --------------------------------
>
>                 Key: KUDU-3353
>                 URL: https://issues.apache.org/jira/browse/KUDU-3353
>             Project: Kudu
>          Issue Type: New Feature
>          Components: api, server
>            Reporter: Yingchun Lai
>            Assignee: Yingchun Lai
>            Priority: Major
>
> h1. motivation
> In some usage scenarios, Kudu table has a column with semantic of "create 
> time", which means it represent the create timestamp of the row. The other 
> columns have the similar semantic as before, for example, the user properties 
> like age, address, and etc.
> Upstream and Kudu user doesn't know whether a row is exist or not, and every 
> cell data is the lastest ingested from, for example, event stream.
> If without the "create time" column, Kudu user can use UPSERT operations to 
> write data to the table, every columns with data will overwrite the old data. 
> But if with the "create time" column, the cell data will be overwrote by the 
> following UPSERT ops, which is not what we expect.
> To achive the goal, we have to read the column out to judge whether the 
> column is NULL or not, if it's NULL, we can fill the row with the cell, if 
> not NULL, we will drop it from the data before UPSERT, to avoid overwite 
> "create time".
> It's expensive, is there a way to avoid a read from Kudu?
> h1. Resolvation
> We can implement column schema with semantic of "update if null". That means 
> cell data in changelist will update the base data if the latter is NULL, and 
> will ignore updates if it is not NULL.
> So we can use Kudu similarly as before, but only defined the column as 
> "update if null" when create table or add column.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KUDU-3353) Support setnx semantic on column

Reply via email to