Bumping this thread.
If you have any opinion, please let me know.
Thanks a lot,
Marton
On 6/26/20 2:51 PM, Elek, Marton wrote:
Thanks you very much to work on this Aravindan.
Finally, I collected my thoughts about the proposal.
First of or, I really like the concept in general, and I like the style
the documentation. It clearly explains a lot of existing behavior of
Ozone to make it easier to understand the problems.
I like the the abstraction of Software Layout Version vs. Metadata
Layout Version
I have some comments, but most of them are about technical details (not
about the concept itself). And they are questions and ideas not strong
opinions.
1. On-line upgrade vs offline-upgrade
There is an option to do the upgrade offline: instead of calling an RPC,
executing a CLI.
a) for online upgrade we need to introduce a very specific running mode
which means that nobody can use the cluster (or just in read only mode?)
until the server is "finalized"
b) CLI can do any migration and upgrade the MLV inside database. The
only question is the old / peristed data in raft log, but IMHO it
shouldn't be a problem:
1. we should commit the MLV upgrade with a raft transaction anyway
2. ratis log entries like client calls, and we supposed to be backward
compatible with old clients
I am not sure if the CLI approach is better (it seems to be more simple
for me) but at least we can compare the two approaches and explain why
do we prefer the RPC based method (if that is the better)
2. I had an interesting conversation about why HDFS clusters are not
upgraded to Hadoop 3 and got some thoughts.
This document propose to always use the same version from SCM and
datanode which makes it simple.
I agree that it simplifies our job, but I think It can make the upgrade
harder. Especially for a 1-2000 node cluster.
After the storage-class proposal I have a different mental model:
I think there can be different type of containers with different
replication strategies. Containers are classified with storage-class and
storage-class defines the container replication type.
In this model it's very easy to imagine that different datanodes can
support different replication type (or replication version).
Let's say I have 1000 nodes and I upgrade 500 of them to a specific
datanode version which can support EC container. SCM can easily manage
this problem if it's already prepared to support different type of
containers / replications (which is our goal, IMHO) based on node
capabilities.
In this model it should be easy to enable independent upgrade of
datanodes which can make it way more easier to upgrade a big cluster.
(but I agree to require OM/SCM/RECON upgrade at the same time)
What do you think about this?
3. Finalize
Personally I don't like the "finalize" word. It suggests that we have an
upgrade process which can be "finalized", but in fact we don't have such
process. We start do any work AFTER the finalize button is pushed.
I know that it comes from the HDFS history, but I would prefer to use a
more generic and expressive words. (For example: jar/binary upgrade vs.
metadata upgrade).
At the end I learned what finally means (thanks to your patient
explanation during offline conversation ;-) ), but we can make the
understanding easier for next users.
4. During you presentation you talked about the downgrade/rollback. I
felt that there could be a lot of tricky corner cases related to ratis +
snapshot. As a concept I like it (but my 2nd point is more important for
me, if possible), but I think we will see tricky technical problems on
the code level.
Thanks again the great work,
Marton
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]