[GitHub] [ozone] elek commented on a change in pull request #2136: HDDS-5073. Use ReplicationConfig on client side

GitBox Tue, 27 Apr 2021 03:22:36 -0700


elek commented on a change in pull request #2136:
URL: https://github.com/apache/ozone/pull/2136#discussion_r621074907




##########
File path: 
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/shell/keys/CopyKeyHandler.java
##########
@@ -82,17 +84,19 @@ protected void execute(OzoneClient client, OzoneAddress 
address)
     OzoneVolume vol = client.getObjectStore().getVolume(volumeName);
     OzoneBucket bucket = vol.getBucket(bucketName);
 
-    if (replicationFactor == null) {
-      replicationFactor = ReplicationFactor.valueOf(
-          getConf().getInt(OZONE_REPLICATION, OZONE_REPLICATION_DEFAULT));
-    }
-
     if (replicationType == null) {
       replicationType = ReplicationType.valueOf(
-          getConf().get(OZONE_REPLICATION_TYPE,
-              OZONE_REPLICATION_TYPE_DEFAULT));
+          getConf()
+              .get(OZONE_REPLICATION_TYPE, OZONE_REPLICATION_TYPE_DEFAULT));
     }
 
+    if (replication == null) {

Review comment:
       That's a good question (thanks for it), and I think there are multiple 
parts.
   
   1. This is only for `ozone sh` commands not for s3 (where we have storage 
classes) or for o3fs/ofs (where we plan to configure it cluster wide or bucket 
level. 
   
   2. Originally I had the same idea. `ReplicationConfig` today can have 
multiple parameters. (For example `3:2` can be defined as `new 
ECReplicationConfig(3,2)`). I thought that it might be better to accept only 
enums (or pre-defined strings) which are further defined by the administrators 
(or hard-coded by the developers):
    
   For example:
   
   | replication type | enum / group name | replication config | 
   |----------------------------|---|---|
   | RATIS | ONE | `new RatisReplicationConfig(ONE) |
   | RATIS | THREE | `new RatisReplicationConfig(THREE) |
   | STANDALONE | ONE | `new StandaloneReplicationConfig(ONE) |
   | EC | `3:2` | `new EcReplicationConfig(3:2)` |
   | EC | `6:4` | `new EcReplicationConfig(6:4)` |
   | EC | `10:3` | `new EcReplicationConfig(10:3)` |
   
   But when we add the `codec` to `ECReplicationConfig` (`RS` vs `XOR`) it will 
create more and more entries in this table.
   
    1. I think (at least at long term) administrators should have the 
flexibility to define new groups (shouldn't be everything hardcoded in the code)
    2. I think it's an additional complexity to manage all of these mappings. 
    
   As `ozone sh` is a low level cli tool, it seems to be easier to avoid this 
grouping and just enable to directly configure all possible parameters 
(especially as there is no limit on the SCM/OM side, we can handle different 
configuration without any problem)
   
   3. Compatibility
   
   I also felt that this approach is a more natural extension to the current 
scheme. We can introduce a new abstraction level (with new cli parameters like 
`--storage-type`) where we accept names instead of low level parameters:
   
   | name | replication type |  replication config | 
   |----------------------------|---|---|
   | REDUCED | RATIS |  `new RatisReplicationConfig(ONE) |
   | DEFAULT | RATIS | | `new RatisReplicationConfig(THREE) |
   | STANDALONE  | `new StandaloneReplicationConfig(ONE) |
   | EC_3_2 | EC  | `new EcReplicationConfig(3:2)` |
   
   But I am not sure if we need this abstraction level, especially at the 
beginning of the feature branch.
   
   4. Enum vs string 
   
   When you asked about enum, we can talk about two parts: do we need 
pre-defined groups AND/OR do we need to store in enum the groups?
   
   I learned with the port changes that enum is very dangerous. We introduced a 
new service on DN and added a new enum value for the service discovery API and 
introduced a lot of backward compatibility problem (new enum values couldn't be 
deserialized with old clients). String seems to be more flexible (we can 
validate it after protobuf conversion).
   
   This example doesn't fully apply here, as it's a full client-only change. 
(on protobuf this string is not sent), but I learned that enums (in some cases) 
can be harder
   
   TLDR: This is why I think the string based approach is better here, but 
these are somewhat subjective points, please add your opinion. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [ozone] elek commented on a change in pull request #2136: HDDS-5073. Use ReplicationConfig on client side

Reply via email to