[ https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578213#comment-17578213 ]
ASF subversion and git services commented on KUDU-3353: ------------------------------------------------------- Commit b6eedb224f715ad86378a92d25f09c2084b0e2b7 in kudu's branch refs/heads/master from Yingchun Lai [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=b6eedb224 ] KUDU-3353 [schema] Add an immutable attribute on column schema (part 1) The overview of design: 1. Add a new column attribute IMMUTABLE, meaning the column cell value can not be updated after it's been written during inserting the row. 2. An attempt to modify an immutable cell of an existing row by UPDATE or UPSERT operation results in returning the newly added Status::IsImmutable(). 3. Use UPDATE_IGNORE and add UPSERT_IGNORE, for UPDATE and UPSERT ops but ignore update errors on IMMUTABLE columns. Note that the rest of the columns are updated accordingly to the operation's data, only the immutable columns aren't changed. With this change, UPDATE_IGNORE ops ignore both 'key not found' and 'update on immutable column' errors. Some use cases: 1. A column represents a semantically constant entity. The corresponding value is present in every row for a particular primary key and might change, but it's captured upon the very first occurrence. An example is 'first_login_timestamp' for a particular user while 'login_timestamp' is present in every login record. 2. Similar to the item 1, but the corresponding value, if present, is the same with every record for a particular primary key. Here the intention is to reduce the length of the column's change list. An example is a 'birthday' column. This patch includes the changes on the server side, proto files, and necessary changes on the client side because of ColumnSchema constructor's been changed. Change-Id: I01e5a806c0e873239b49e6d0b37a7e36578b508d Reviewed-on: http://gerrit.cloudera.org:8080/18742 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <ale...@apache.org> > Support setnx semantic on column > -------------------------------- > > Key: KUDU-3353 > URL: https://issues.apache.org/jira/browse/KUDU-3353 > Project: Kudu > Issue Type: New Feature > Components: api, server > Reporter: Yingchun Lai > Assignee: Yingchun Lai > Priority: Major > > h1. motivation > In some usage scenarios, Kudu table has a column with semantic of "create > time", which means it represent the create timestamp of the row. The other > columns have the similar semantic as before, for example, the user properties > like age, address, and etc. > Upstream and Kudu user doesn't know whether a row is exist or not, and every > cell data is the lastest ingested from, for example, event stream. > If without the "create time" column, Kudu user can use UPSERT operations to > write data to the table, every columns with data will overwrite the old data. > But if with the "create time" column, the cell data will be overwrote by the > following UPSERT ops, which is not what we expect. > To achive the goal, we have to read the column out to judge whether the > column is NULL or not, if it's NULL, we can fill the row with the cell, if > not NULL, we will drop it from the data before UPSERT, to avoid overwite > "create time". > It's expensive, is there a way to avoid a read from Kudu? > h1. Resolvation > We can implement column schema with semantic of "update if null". That means > cell data in changelist will update the base data if the latter is NULL, and > will ignore updates if it is not NULL. > So we can use Kudu similarly as before, but only defined the column as > "update if null" when create table or add column. > -- This message was sent by Atlassian Jira (v8.20.10#820010)