[ 
https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578213#comment-17578213
 ] 

ASF subversion and git services commented on KUDU-3353:
-------------------------------------------------------

Commit b6eedb224f715ad86378a92d25f09c2084b0e2b7 in kudu's branch 
refs/heads/master from Yingchun Lai
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=b6eedb224 ]

KUDU-3353 [schema] Add an immutable attribute on column schema (part 1)

The overview of design:
1. Add a new column attribute IMMUTABLE, meaning the column cell
   value can not be updated after it's been written during inserting
   the row.
2. An attempt to modify an immutable cell of an existing row by
   UPDATE or UPSERT operation results in returning the newly added
   Status::IsImmutable().
3. Use UPDATE_IGNORE and add UPSERT_IGNORE, for UPDATE and UPSERT
   ops but ignore update errors on IMMUTABLE columns. Note that
   the rest of the columns are updated accordingly to the
   operation's data, only the immutable columns aren't changed.
   With this change, UPDATE_IGNORE ops ignore both 'key not found'
   and 'update on immutable column' errors.

Some use cases:
1. A column represents a semantically constant entity.  The
   corresponding value is present in every row for a particular
   primary key and might change, but it's captured upon the very
   first occurrence.  An example is 'first_login_timestamp' for
   a particular user while 'login_timestamp' is present in every
   login record.
2. Similar to the item 1, but the corresponding value, if present,
   is the same with every record for a particular primary key.
   Here the intention is to reduce the length of the column's
   change list.  An example is a 'birthday' column.

This patch includes the changes on the server side, proto files, and
necessary changes on the client side because of ColumnSchema
constructor's been changed.

Change-Id: I01e5a806c0e873239b49e6d0b37a7e36578b508d
Reviewed-on: http://gerrit.cloudera.org:8080/18742
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <ale...@apache.org>


> Support setnx semantic on column
> --------------------------------
>
>                 Key: KUDU-3353
>                 URL: https://issues.apache.org/jira/browse/KUDU-3353
>             Project: Kudu
>          Issue Type: New Feature
>          Components: api, server
>            Reporter: Yingchun Lai
>            Assignee: Yingchun Lai
>            Priority: Major
>
> h1. motivation
> In some usage scenarios, Kudu table has a column with semantic of "create 
> time", which means it represent the create timestamp of the row. The other 
> columns have the similar semantic as before, for example, the user properties 
> like age, address, and etc.
> Upstream and Kudu user doesn't know whether a row is exist or not, and every 
> cell data is the lastest ingested from, for example, event stream.
> If without the "create time" column, Kudu user can use UPSERT operations to 
> write data to the table, every columns with data will overwrite the old data. 
> But if with the "create time" column, the cell data will be overwrote by the 
> following UPSERT ops, which is not what we expect.
> To achive the goal, we have to read the column out to judge whether the 
> column is NULL or not, if it's NULL, we can fill the row with the cell, if 
> not NULL, we will drop it from the data before UPSERT, to avoid overwite 
> "create time".
> It's expensive, is there a way to avoid a read from Kudu?
> h1. Resolvation
> We can implement column schema with semantic of "update if null". That means 
> cell data in changelist will update the base data if the latter is NULL, and 
> will ignore updates if it is not NULL.
> So we can use Kudu similarly as before, but only defined the column as 
> "update if null" when create table or add column.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to