sodonnel commented on code in PR #4929:
URL: https://github.com/apache/ozone/pull/4929#discussion_r1236614774
##########
hadoop-ozone/ozonefs-common/src/main/java/org/apache/hadoop/fs/ozone/OzoneClientAdapter.java:
##########
@@ -46,9 +46,17 @@ public interface OzoneClientAdapter {
OzoneFSOutputStream createFile(String key, short replication,
boolean overWrite, boolean recursive) throws IOException;
+ OzoneFSOutputStream createFile(String key, short replication,
+ boolean overWrite, boolean recursive,
+ String ecPolicyName) throws IOException;
Review Comment:
I'd be interested to see how this looks at the HDFS side. The problem is
that the HDFS defined interface does not allow for passing an EC replication
definition. It only has "int replication" in the interface.
For HDFS, if you try to write to a folder which has an EC policy set, HDFS
detects the policy on the folder and ignores the replication field, writing EC.
There is no way to set EC for a specific key in HDFS, and you cannot write
replicated data into an EC enabled folder either.
Ozone is different. We allow the replication config to be inherited from the
bucket if nothing is passed for the key. If something is passed for the key, it
overrides whatever is set on the bucket. So you have have an EC-3-2 bucket, and
write RATIS-THREE data into it.
There is the possibility of this being a two way problem - what if someone
uses distcp to copy data from Ozone to HDFS?
* The data is EC in Ozone, but goes to a non EC folder in HDFS - how will
that work, and what will the replication factor be set to in HDFS?
* The data is Ratis in Ozone, going to an EC folder in HDFS - this is easy,
HDFS will set the policy to EC.
Then we have the HDFS to Ozone scenarios:
* HDFS EC to Ozone EC bucket
* HDFS EC to Ozone Ratis Bucket
* HDFS Replicated to Ozone EC bucket
* HDFS Replicated to Ozone Replicated
I am not sure where in distcp it calls to Ozone, but perhaps we need to
change the HDFS interface that Ozone implements first to define how this can
work from distcp, and then fix Ozone to make it use that extended interface?
I wonder if there is some flag we could pass, or a special replication value
(eg -1) to make Ozone ignore the key level setting and just use whatever is set
at the bucket? Although even then, we would not want HDFS EC data to end up as
Ratis-ONE data in Ozone if a bucket is RATIS-THREE.
Could you point us at the code in distcp that would need to change to use a
new interface, and then we can have a think about any possible ways of changing
it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]