errose28 commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1554331312


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if 
it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. 
Therefore multiple writers can concurrently write the same key, and which ever 
commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being 
replaced.
+
+For any key, but especially a large key, it can take significant time to read 
and write it. There are scenarios where it would be desirable to replace a key 
in Ozone, but only if the key has not changed since it was read. With the 
absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored 
in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the 
updateID is changed. It comes from the ratis transactionID and is generally an 
increasing number.
+
+When an existing key is over written, its existing metadata including the 
ObjectID and ACLs are mirrored onto the new key version. The only metadata 
which is replaced is any custom metadata stored on the key by the user. Upon 
commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update 
counter similar to the updateID for a key in Ozone. The data record can be read 
and displayed on a UI to be edited, and then written back to the database. 
However another user could have made an edit to the same record in the mean 
time, and if the record is written back without any checks, those edits could 
be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no 
locks are actually involved. The client reads the data along with the update 
counter. When it attempts to write the data back, it validates the record has 
not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the 
customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key 
only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended 
to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the 
original, passing the previously read updateID in a new field. Call this new 
field overwriteExpectedUpdateID.

Review Comment:
   Why do we need a new field for this? The same `OmKeyInfo` object is used for 
open and committed keys, so open keys already have an update ID field that is 
not being used.



##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if 
it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. 
Therefore multiple writers can concurrently write the same key, and which ever 
commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being 
replaced.
+
+For any key, but especially a large key, it can take significant time to read 
and write it. There are scenarios where it would be desirable to replace a key 
in Ozone, but only if the key has not changed since it was read. With the 
absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored 
in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the 
updateID is changed. It comes from the ratis transactionID and is generally an 
increasing number.
+
+When an existing key is over written, its existing metadata including the 
ObjectID and ACLs are mirrored onto the new key version. The only metadata 
which is replaced is any custom metadata stored on the key by the user. Upon 
commit, the updateID is also changed to the current Ratis transaction ID.

Review Comment:
   Purely talking about what is already in Ozone here, note #6445 which fixed a 
bug in this space because encryption keys were not being replaced on overwrite. 
Also note [this 
comment](https://github.com/apache/ozone/pull/6385#discussion_r1538383070) 
which discusses a potential problem with the ACL copying.
   
   I think its important to call out the potential issues with the current 
design in this section so readers are aware when it is referenced later. 



##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if 
it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. 
Therefore multiple writers can concurrently write the same key, and which ever 
commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being 
replaced.
+
+For any key, but especially a large key, it can take significant time to read 
and write it. There are scenarios where it would be desirable to replace a key 
in Ozone, but only if the key has not changed since it was read. With the 
absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored 
in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the 
updateID is changed. It comes from the ratis transactionID and is generally an 
increasing number.
+
+When an existing key is over written, its existing metadata including the 
ObjectID and ACLs are mirrored onto the new key version. The only metadata 
which is replaced is any custom metadata stored on the key by the user. Upon 
commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update 
counter similar to the updateID for a key in Ozone. The data record can be read 
and displayed on a UI to be edited, and then written back to the database. 
However another user could have made an edit to the same record in the mean 
time, and if the record is written back without any checks, those edits could 
be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no 
locks are actually involved. The client reads the data along with the update 
counter. When it attempts to write the data back, it validates the record has 
not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the 
customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key 
only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended 
to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the 
original, passing the previously read updateID in a new field. Call this new 
field overwriteExpectedUpdateID.
+3. On OM, it receives the openKey request as usual and detects the presence of 
the overwriteExpectedUpdateID.
+4. On OM, it first ensures that a key is present with the given key name and 
having a updateID == overwriteExpectedUpdateID. If so, it opens the key and 
stored the details including the overwriteExpectedUpdateID in the openKeyTable. 
As things stand, the other existing key metadata copied from the original key 
is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the 
overwriteExpectedUpdateID again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key 
name and its updateID is unchanged. If so the key is committed, otherwise an 
error is returned to the client.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The overwriteExpectedUpdateID needs to be added to the KeyInfo protobuf 
object so it can be stored in the openKey table.
+2. The overwriteExpectedUpdateID needs to be added to the keyArgs protobuf 
object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers 
will receive the new overwriteExpectedUpdateID and perform the checked.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are 
existing locks taken to ensure the key open / commit is atomic. The new checks 
are performed under those locks, and come down to a couple of long comparisons, 
so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID of an existing key to be accessible when an 
existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There 
are internal object changes and do no impact any APIs.
+ 2. To pass the overwriteExpectedUpdateID to OM on key open, it would be 
possible to overload the existing OzoneBucket.createKey() method, which already 
has several overloaded versions, or create a new explicit method on Ozone 
bucket called replaceKeyIfUnchanged, passing either the OzoneKeyDetails of the 
existing key (which includes the key name and existing updateID, or by passing 
the key name and updateID explicitly, eg:
+ 
+ ```
+ public OzoneOutputStream replaceKeyIfUnchanged(OzoneKeyDetails 
keyToOverwrite, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+         
+// Alternatively or additionally
+
+ public OzoneOutputStream replaceKeyIfUnchanged(String volumeName, String 
bucketName, String keyName, long size, long expectedUpdateID, ReplicationConfig 
replicationConfigOfNewKey)
+      throws IOException 

Review Comment:
   Just some thoughts on the names used, we can continue to discuss:
   
   Would calling the API `rewriteKey` and having it take a `rewriteID` be 
sufficiently descriptive while less verbose? It seems like a pretty literal 
description of what the API does.
   
   I think we should avoid the term "update ID" in the API since that is an OM 
internal detail about how we check the key didn't get changed. By giving this 
field a more generic name, we have the option to switch this to other things 
like key version in the future. Right now this is just a token the client gets 
and gives back to the OM to check. Where the value came from within the OM is 
not important.



##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if 
it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. 
Therefore multiple writers can concurrently write the same key, and which ever 
commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being 
replaced.
+
+For any key, but especially a large key, it can take significant time to read 
and write it. There are scenarios where it would be desirable to replace a key 
in Ozone, but only if the key has not changed since it was read. With the 
absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored 
in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the 
updateID is changed. It comes from the ratis transactionID and is generally an 
increasing number.
+
+When an existing key is over written, its existing metadata including the 
ObjectID and ACLs are mirrored onto the new key version. The only metadata 
which is replaced is any custom metadata stored on the key by the user. Upon 
commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update 
counter similar to the updateID for a key in Ozone. The data record can be read 
and displayed on a UI to be edited, and then written back to the database. 
However another user could have made an edit to the same record in the mean 
time, and if the record is written back without any checks, those edits could 
be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no 
locks are actually involved. The client reads the data along with the update 
counter. When it attempts to write the data back, it validates the record has 
not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the 
customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key 
only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended 
to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the 
original, passing the previously read updateID in a new field. Call this new 
field overwriteExpectedUpdateID.
+3. On OM, it receives the openKey request as usual and detects the presence of 
the overwriteExpectedUpdateID.
+4. On OM, it first ensures that a key is present with the given key name and 
having a updateID == overwriteExpectedUpdateID. If so, it opens the key and 
stored the details including the overwriteExpectedUpdateID in the openKeyTable. 
As things stand, the other existing key metadata copied from the original key 
is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the 
overwriteExpectedUpdateID again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key 
name and its updateID is unchanged. If so the key is committed, otherwise an 
error is returned to the client.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The overwriteExpectedUpdateID needs to be added to the KeyInfo protobuf 
object so it can be stored in the openKey table.
+2. The overwriteExpectedUpdateID needs to be added to the keyArgs protobuf 
object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers 
will receive the new overwriteExpectedUpdateID and perform the checked.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are 
existing locks taken to ensure the key open / commit is atomic. The new checks 
are performed under those locks, and come down to a couple of long comparisons, 
so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID of an existing key to be accessible when an 
existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There 
are internal object changes and do no impact any APIs.
+ 2. To pass the overwriteExpectedUpdateID to OM on key open, it would be 
possible to overload the existing OzoneBucket.createKey() method, which already 
has several overloaded versions, or create a new explicit method on Ozone 
bucket called replaceKeyIfUnchanged, passing either the OzoneKeyDetails of the 
existing key (which includes the key name and existing updateID, or by passing 
the key name and updateID explicitly, eg:
+ 
+ ```
+ public OzoneOutputStream replaceKeyIfUnchanged(OzoneKeyDetails 
keyToOverwrite, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+         
+// Alternatively or additionally
+
+ public OzoneOutputStream replaceKeyIfUnchanged(String volumeName, String 
bucketName, String keyName, long size, long expectedUpdateID, ReplicationConfig 
replicationConfigOfNewKey)
+      throws IOException 
+
+         
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+Note the omission of the metaData map, as the intention of this API is to copy 
that from what already exisits on the server.

Review Comment:
   Why not allow the API to do atomic rewrite of certain metadata fields in 
addition to the data rewrite? It seems like more work to enforce this 
restriction than allow it, and I'm not sure what error it is trying to prevent.



##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if 
it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. 
Therefore multiple writers can concurrently write the same key, and which ever 
commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being 
replaced.
+
+For any key, but especially a large key, it can take significant time to read 
and write it. There are scenarios where it would be desirable to replace a key 
in Ozone, but only if the key has not changed since it was read. With the 
absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored 
in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the 
updateID is changed. It comes from the ratis transactionID and is generally an 
increasing number.
+
+When an existing key is over written, its existing metadata including the 
ObjectID and ACLs are mirrored onto the new key version. The only metadata 
which is replaced is any custom metadata stored on the key by the user. Upon 
commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update 
counter similar to the updateID for a key in Ozone. The data record can be read 
and displayed on a UI to be edited, and then written back to the database. 
However another user could have made an edit to the same record in the mean 
time, and if the record is written back without any checks, those edits could 
be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no 
locks are actually involved. The client reads the data along with the update 
counter. When it attempts to write the data back, it validates the record has 
not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the 
customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key 
only if it has not changed since the key details were originally read.
+
+To do this:

Review Comment:
   There's at least one other approach I can think of:
   1. Create key always returns the update ID. When calling the rewrite API we 
save it in memory at the client
   2. Client attaches update ID to the commit request to indicate a rewrite 
instead of a put
   3. OM checks the update ID if present and returns the corresponding 
success/fail result
   
   It would be good to turn this section into a pro/con analysis of each 
approach. Currently I see the above having advantages over the approach listed 
here, especially given this goal at the end of the doc:
   > The intention of this initial design is to make as few changes to Ozone as 
possible to enable overwriting a key if it has not changed.
   
   - Fewer proto changes:
     - Server side DB protos do not need to be changed
     - Client side create key request proto does not need to be changed
   - This implies fewer upgrade concerns
     - Only need to worry about client-server cross compatibility, as expected 
with an API change.
     - No concerns for server upgrade/downgrade.
   - Independent of the system's already problematic metadata overwrite/update 
semantics described above.
   
   The current proposal may have advantages too, but we really need a 
side-by-side comparison to make an informed decision.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to