Bumping this thread.

If you have any opinion, please let me know.

Thanks a lot,
Marton




On 6/26/20 2:51 PM, Elek, Marton wrote:

Thanks you very much to work on this Aravindan.

Finally, I collected my thoughts about the proposal.

First of or, I really like the concept in general, and I like the style the documentation. It clearly explains a lot of existing behavior of Ozone to make it easier to understand the problems.

I like the the abstraction of Software Layout Version vs. Metadata Layout Version

I have some comments, but most of them are about technical details (not about the concept itself). And they are questions and ideas not strong opinions.

1. On-line upgrade vs offline-upgrade

There is an option to do the upgrade offline: instead of calling an RPC, executing a CLI.

a) for online upgrade we need to introduce a very specific running mode which means that nobody can use the cluster (or just in read only mode?) until the server is "finalized"

b) CLI can do any migration and upgrade the MLV inside database. The only question is the old / peristed data in raft log, but IMHO it shouldn't be a problem:

  1. we should commit the MLV upgrade with a raft transaction anyway
 2. ratis log entries like client calls, and we supposed to be backward compatible with old clients

I am not sure if the CLI approach is better (it seems to be more simple for me) but at least we can compare the two approaches and explain why do we prefer the RPC based method (if that is the better)

2. I had an interesting conversation about why HDFS clusters are not upgraded to Hadoop 3 and got some thoughts.

This document propose to always use the same version from SCM and datanode which makes it simple.

I agree that it simplifies our job, but I think It can make the upgrade harder. Especially for a 1-2000 node cluster.

After the storage-class proposal I have a different mental model:

 I think there can be different type of containers with different replication strategies. Containers are classified with storage-class and storage-class defines the container replication type.

In this model it's very easy to imagine that different datanodes can support different replication type (or replication version).

Let's say I have 1000 nodes and I upgrade 500 of them to a specific datanode version which can support EC container. SCM can easily manage this problem if it's already prepared to support different type of containers / replications (which is our goal, IMHO) based on node capabilities.

In this model it should be easy to enable independent upgrade of datanodes which can make it way more easier to upgrade a big cluster. (but I agree to require OM/SCM/RECON upgrade at the same time)


What do you think about this?


3. Finalize

Personally I don't like the "finalize" word. It suggests that we have an upgrade process which can be "finalized", but in fact we don't have such process. We start do any work AFTER the finalize button is pushed.

I know that it comes from the HDFS history, but I would prefer to use a more generic and expressive words. (For example: jar/binary upgrade vs. metadata upgrade).

At the end I learned what finally means (thanks to your patient explanation during offline conversation ;-) ), but we can make the understanding easier for next users.

4. During you presentation you talked about the downgrade/rollback. I felt that there could be a lot of tricky corner cases related to ratis + snapshot. As a concept I like it (but my 2nd point is more important for me, if possible), but I think we will see tricky technical problems on the code level.


Thanks again the great work,
Marton

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to