Re: Ozone non-rolling upgrades

Elek, Marton Tue, 25 Aug 2020 05:22:58 -0700


Bumping this thread.

If you have any opinion, please let me know.

Thanks a lot,
Marton




On 6/26/20 2:51 PM, Elek, Marton wrote:

Thanks you very much to work on this Aravindan.

Finally, I collected my thoughts about the proposal.
First of or, I really like the concept in general, and I like the stylethe documentation. It clearly explains a lot of existing behavior ofOzone to make it easier to understand the problems.
I like the the abstraction of Software Layout Version vs. MetadataLayout Version
I have some comments, but most of them are about technical details (notabout the concept itself). And they are questions and ideas not strongopinions.
1. On-line upgrade vs offline-upgrade
There is an option to do the upgrade offline: instead of calling an RPC,executing a CLI.
a) for online upgrade we need to introduce a very specific running modewhich means that nobody can use the cluster (or just in read only mode?)until the server is "finalized"
b) CLI can do any migration and upgrade the MLV inside database. Theonly question is the old / peristed data in raft log, but IMHO itshouldn't be a problem:
  1. we should commit the MLV upgrade with a raft transaction anyway
2. ratis log entries like client calls, and we supposed to be backwardcompatible with old clients
I am not sure if the CLI approach is better (it seems to be more simplefor me) but at least we can compare the two approaches and explain whydo we prefer the RPC based method (if that is the better)
2. I had an interesting conversation about why HDFS clusters are notupgraded to Hadoop 3 and got some thoughts.
This document propose to always use the same version from SCM anddatanode which makes it simple.
I agree that it simplifies our job, but I think It can make the upgradeharder. Especially for a 1-2000 node cluster.
After the storage-class proposal I have a different mental model:
I think there can be different type of containers with differentreplication strategies. Containers are classified with storage-class andstorage-class defines the container replication type.
In this model it's very easy to imagine that different datanodes cansupport different replication type (or replication version).
Let's say I have 1000 nodes and I upgrade 500 of them to a specificdatanode version which can support EC container. SCM can easily managethis problem if it's already prepared to support different type ofcontainers / replications (which is our goal, IMHO) based on nodecapabilities.
In this model it should be easy to enable independent upgrade ofdatanodes which can make it way more easier to upgrade a big cluster.(but I agree to require OM/SCM/RECON upgrade at the same time)
What do you think about this?


3. Finalize
Personally I don't like the "finalize" word. It suggests that we have anupgrade process which can be "finalized", but in fact we don't have suchprocess. We start do any work AFTER the finalize button is pushed.
I know that it comes from the HDFS history, but I would prefer to use amore generic and expressive words. (For example: jar/binary upgrade vs.metadata upgrade).
At the end I learned what finally means (thanks to your patientexplanation during offline conversation ;-) ), but we can make theunderstanding easier for next users.
4. During you presentation you talked about the downgrade/rollback. Ifelt that there could be a lot of tricky corner cases related to ratis +snapshot. As a concept I like it (but my 2nd point is more important forme, if possible), but I think we will see tricky technical problems onthe code level.
Thanks again the great work,
Marton

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Ozone non-rolling upgrades

Reply via email to