[jira] [Comment Edited] (HDFS-14560) Allow block replication parameters to be refreshable

Stephen O'Donnell (JIRA) Tue, 11 Jun 2019 13:15:07 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861447#comment-16861447
 ]


Stephen O'Donnell edited comment on HDFS-14560 at 6/11/19 8:14 PM:
-------------------------------------------------------------------

[~hexiaoqiao] I agree that it would nice for as many parameters as possible to 
be refreshable without a restart. Based on how things work at the moment, you 
need to add some logic for each parameter, so making the majority of parameters 
configurable would probably need a rethink on how things work.

It would be good to gather a list of parameters that administrators would often 
like to change and then we could see how we could enable them. I picked these 
ones, as adjusting the speed of replication is something we see a need for 
frequently. Similar to the balancer parameters, bandwidth and concurrent moves 
- they are already refreshable in the datanodes without a restart.

[~elgoiri] Thanks for the detailed review. I have uploaded v003. Addressing 
your comments:

> I would use LambdaTestUtils#intercept for verifying the exceptions.

OK, I had never seen that before. TBH, I am not sure it improves the 
readability of the code, but it may be just because I am new to it. I have 
changed the tests to use intercept, so please check I am using it correctly, 
especially as we need to extract the 'cause' exception from the 
ReconfigurationException to validate the reason for the failure.

> I would verify that the initial values in 
> TestRefreshNamenodeReplicationConfig (at least in 
> testParamsCanBeReconfigured).

Good point, I have added those assertions.

> I am not sure there is a point on capturing ReconfigurationException and then 
> fail in testParamsCanBeReconfigured(), just letting it go should be enough.

True, I have removed the exception block.

> We can use for (String key : keys) in a couple of the cases where we use 
> indices.

Updated to use the key : keys syntax.

> Let's use Logger style logs with {}.

Done.

> It is a little inconsistent the order of the parameters in 
> reconfigureSPSModeEvent and reconfReplicationParameters.

Looking down the if statement, some calls have property, newval, others have 
newval, property - I have updated reconfReplicationParameters to be the same as 
the one next to it to keep things as consistent as possible.

> Expand the imports in TestRefreshNamenodeReplicationConfig.

Done.

> Do we need 1 DN for the mini cluster? I would like to minimize the time for 
> this test.

I guess we don't I have set it to zero.

> Looks like getReplicationStreamsHardLimit() and getBlocksReplWorkMultiplier() 
> are there just for the tests. What about reducing the visibility and marking 
> it as VisibleForTesting?

They are there for the tests, but also the refresh code gets the value it just 
set to return it and log it, proving the new value has been set in the logs. I 
guess that is a bit redundant, but I was following the pattern used in a couple 
of the other methods I checked.

> The preconditions seem pretty repetitive. I think this could be set as a 
> common pattern somewhere but at least here.

I have created a helper method in the class and reused it in the 3 setters.





was (Author: sodonnell):
[~hexiaoqiao] I agree that it would nice for as many parameters as possible to 
be refreshable without a restart. Based on how things work at the moment, you 
need to add some logic for each parameter, so making the majority of parameters 
configurable would probably need a rethink on how things work.

It would be good to gather a list of parameters that administrators would often 
like to change and then we could see how we could enable them. I picked these 
ones, as adjusting the speed of replication is something we see a need for 
frequently. Similar to the balancer parameters, bandwidth and concurrent moves 
- they are already refreshable in the datanodes without a restart.

[~elgoiri] Thanks for the detailed review. I have uploaded v003. Addressing 
your comments:

> I would use LambdaTestUtils#intercept for verifying the exceptions.

OK, I had never seen that before. TBH, I am not sure it improves the 
readability of the code, but it may be just because I am new to it. I have 
changed the tests to use intercept, so please check I am using it correctly.

> I would verify that the initial values in 
> TestRefreshNamenodeReplicationConfig (at least in 
> testParamsCanBeReconfigured).

Good point, I have added those assertions.

> I am not sure there is a point on capturing ReconfigurationException and then 
> fail in testParamsCanBeReconfigured(), just letting it go should be enough.

True, I have removed the exception block.

> We can use for (String key : keys) in a couple of the cases where we use 
> indices.

Updated to use the key : keys syntax.

> Let's use Logger style logs with {}.

Done.

> It is a little inconsistent the order of the parameters in 
> reconfigureSPSModeEvent and reconfReplicationParameters.

Looking down the if statement, some calls have property, newval, others have 
newval, property - I have updated reconfReplicationParameters to be the same as 
the one next to it to keep things as consistent as possible.

> Expand the imports in TestRefreshNamenodeReplicationConfig.

Done.

> Do we need 1 DN for the mini cluster? I would like to minimize the time for 
> this test.

I guess we don't I have set it to zero.

> Looks like getReplicationStreamsHardLimit() and getBlocksReplWorkMultiplier() 
> are there just for the tests. What about reducing the visibility and marking 
> it as VisibleForTesting?

They are there for the tests, but also the refresh code gets the value it just 
set to return it and log it, proving the new value has been set in the logs. I 
guess that is a bit redundant, but I was following the pattern used in a couple 
of the other methods I checked.

> The preconditions seem pretty repetitive. I think this could be set as a 
> common pattern somewhere but at least here.

I have created a helper method in the class and reused it in the 3 setters.




> Allow block replication parameters to be refreshable
> ----------------------------------------------------
>
>                 Key: HDFS-14560
>                 URL: https://issues.apache.org/jira/browse/HDFS-14560
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 3.3.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>         Attachments: HDFS-14560.001.patch, HDFS-14560.002.patch
>
>
> There are 3 key parameters that control the speed of block replication across 
> the cluster:
> {code}
> dfs.namenode.replication.max-streams
> dfs.namenode.replication.max-streams-hard-limit
> dfs.namenode.replication.work.multiplier.per.iteration
> {code}
> These are used when decomissioning nodes and when under replicated blocks are 
> being recovered across the cluster. There are times when it may be desirable 
> to increase the speed of replication and then reduce it again (eg during off 
> peak hours) without restarting the namenode.
> Therefore it would be good to be able to reconfigure / refresh these 
> parameters dynamically without a namenode restart.
> This Jira is to allow these parameters to be refreshed at runtime without a 
> restart.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14560) Allow block replication parameters to be refreshable

Reply via email to