[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces

2020-01-02 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007258#comment-17007258
 ] 

Yiqun Lin commented on HDFS-15087:
--

{quote}
Snapshot: Using the snapshot diff maybe? I'm not sure.
{quote}
I think Inigo'r proposal is that we can firstly create an initial snapshot to 
do the SaveTree. And for the incremental change in source folder during the 
subsequent phases,  we can create the new snapshot and do the snapshot diff for 
SaveTree and then do the same  as the first time procedure. If we find there is 
only very few data change (maybe we will have a threshold value here), we do 
the block write until last SaveTree,.., transfer block , add hard link finished.

{quote}
The approach described in the doc requires hard linking. I think this is a good 
idea for the start but I would push to make it pluggable/abstract so in the 
future we can have other implementations.
{quote}
I am +1 for this, this will be better to be pluggable.
Others look good to me.

[~LiJinglun], feel free to attach your initial patch.

> RBF: Balance/Rename across federation namespaces
> 
>
> Key: HDFS-15087
> URL: https://issues.apache.org/jira/browse/HDFS-15087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Priority: Major
> Attachments: HFR_Rename Across Federation Namespaces.pdf
>
>
> The Xiaomi storage team has developed a new feature called HFR(HDFS 
> Federation Rename) that enables us to do balance/rename across federation 
> namespaces. The idea is to first move the meta to the dst NameNode and then 
> link all the replicas. It has been working in our largest production cluster 
> for 2 months. We use it to balance the namespaces. It turns out HFR is fast 
> and flexible. The detail could be found in the design doc. 
> Looking forward to a lively discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces

2020-01-02 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007253#comment-17007253
 ] 

Jinglun commented on HDFS-15087:


Hi [~ayushtkn], the approach in the design doc doesn't cover the non-shared 
DNs. If we let the DN to transfer the blocks, the process would be: *block 
writes -> saveTree -> graftTree -> transfer blocks -> update mount table*. 
Since we got bandwidth limit, I'm afraid the process would be too long. In this 
case I think we can use the option 3 "Incremental Distcp" to do the balance. We 
only need to block writes on the final round of distcp, so the writes blocking 
period should be shorter.

For a non-shared DNs cluster, I think we can not support normal user rename 
operations because the data transfer costs too much time.

So my initial patch should includes both option 1 and option 3.

> RBF: Balance/Rename across federation namespaces
> 
>
> Key: HDFS-15087
> URL: https://issues.apache.org/jira/browse/HDFS-15087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Priority: Major
> Attachments: HFR_Rename Across Federation Namespaces.pdf
>
>
> The Xiaomi storage team has developed a new feature called HFR(HDFS 
> Federation Rename) that enables us to do balance/rename across federation 
> namespaces. The idea is to first move the meta to the dst NameNode and then 
> link all the replicas. It has been working in our largest production cluster 
> for 2 months. We use it to balance the namespaces. It turns out HFR is fast 
> and flexible. The detail could be found in the design doc. 
> Looking forward to a lively discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces

2020-01-02 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007212#comment-17007212
 ] 

Ayush Saxena commented on HDFS-15087:
-

Thanx [~LiJinglun] for the updates, One last doubt I have, As it is said, the 
datanodes needs to be shared, We have couple of use cases where the federated 
clusters doesn't have shared DN's.
So, Would that be a limitation with the approach in the design doc, or is there 
a cover to that, We fallback to some other mechanism, like directly copying the 
block to the other DN, or something like that?
If there is a cover to this too, in any way I am +1 for the approach.
This being plugable shall not block us from upgrading to a better approach, if 
we tend to get any.

> RBF: Balance/Rename across federation namespaces
> 
>
> Key: HDFS-15087
> URL: https://issues.apache.org/jira/browse/HDFS-15087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Priority: Major
> Attachments: HFR_Rename Across Federation Namespaces.pdf
>
>
> The Xiaomi storage team has developed a new feature called HFR(HDFS 
> Federation Rename) that enables us to do balance/rename across federation 
> namespaces. The idea is to first move the meta to the dst NameNode and then 
> link all the replicas. It has been working in our largest production cluster 
> for 2 months. We use it to balance the namespaces. It turns out HFR is fast 
> and flexible. The detail could be found in the design doc. 
> Looking forward to a lively discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces

2020-01-02 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007186#comment-17007186
 ] 

Jinglun commented on HDFS-15087:


I think we have 4 options about the balance/rename:
 # The design doc way: block writes -> saveTree -> graftTree -> hardlink -> 
update router mount table
 # FastCopy: block writes -> FastCopy -> update router mount table
 # Incremental DistCp: Distcp many times -> block writes -> final distcp -> 
update router mount table
 # Snapshot: Using the snapshot diff maybe? I'm not sure.

I'd prefer option 1. Because it's fast and can be used in both balance and 
rename. The FastCopy is not maintained for a while so using option 2 needs much 
work to update FastCopy I think. The weak points of distcp is mentioned before: 
"too slow to support rename" + "doubles the space" + "distcp listing costs too 
much time when the src-path is big".

The Scheduler model in HFR is plugable so choosing option 1 doesn't mean 
rejecting all the other options. So I think may be we can start with option 1.

If we all agree, I'll upload the initial patch.

> RBF: Balance/Rename across federation namespaces
> 
>
> Key: HDFS-15087
> URL: https://issues.apache.org/jira/browse/HDFS-15087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Priority: Major
> Attachments: HFR_Rename Across Federation Namespaces.pdf
>
>
> The Xiaomi storage team has developed a new feature called HFR(HDFS 
> Federation Rename) that enables us to do balance/rename across federation 
> namespaces. The idea is to first move the meta to the dst NameNode and then 
> link all the replicas. It has been working in our largest production cluster 
> for 2 months. We use it to balance the namespaces. It turns out HFR is fast 
> and flexible. The detail could be found in the design doc. 
> Looking forward to a lively discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces

2020-01-02 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007176#comment-17007176
 ] 

Jinglun commented on HDFS-15087:


Hi [~ayushtkn] , thanks your comments. The meta(INodes,Blocks,Tree structure) 
is serialized in the same way as in FSImage. So every thing is preserved.

The HFR can support EC files too. We are developing it now.

> RBF: Balance/Rename across federation namespaces
> 
>
> Key: HDFS-15087
> URL: https://issues.apache.org/jira/browse/HDFS-15087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Priority: Major
> Attachments: HFR_Rename Across Federation Namespaces.pdf
>
>
> The Xiaomi storage team has developed a new feature called HFR(HDFS 
> Federation Rename) that enables us to do balance/rename across federation 
> namespaces. The idea is to first move the meta to the dst NameNode and then 
> link all the replicas. It has been working in our largest production cluster 
> for 2 months. We use it to balance the namespaces. It turns out HFR is fast 
> and flexible. The detail could be found in the design doc. 
> Looking forward to a lively discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8591) Remove support for deprecated configuration key dfs.namenode.decommission.nodes.per.interval

2020-01-02 Thread Danny Becker (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007166#comment-17007166
 ] 

Danny Becker commented on HDFS-8591:


There is an issue with the logic here which can cause the decommissioner to get 
stuck in a nearly infinite loop. The decommissioner checks a datanode which is 
in_maintenance and no blocks are checked. The decommissioner will continue to 
loop through this until the datanode is no longer in_maintenance or it reaches 
Integer.MAX_VALUE.

> Remove support for deprecated configuration key 
> dfs.namenode.decommission.nodes.per.interval
> 
>
> Key: HDFS-8591
> URL: https://issues.apache.org/jira/browse/HDFS-8591
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Minor
> Fix For: 3.0.0-alpha1
>
> Attachments: hdfs-8591.001.patch
>
>
> dfs.namenode.decommission.nodes.per.interval is deprecated in branch-2 and 
> can be removed in trunk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2020-01-02 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007055#comment-17007055
 ] 

Ahmed Hussein commented on HDFS-14854:
--

[~sodonnell], I see that you commented out one of the checks in 
"{{TestDecommissioningStatus.testDecommissionStatus()"}}

Can you please  share your experience with that test case and why you decided 
to remove the check?

There are some old Jiras suggesting that "{{testDecommissionStatus"}} is flaky.
 * HDFS-12188
 * HDFS-9599
 * HDFS-9950
 * HDFS-10755
  

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: 012_to_013_changes.diff, 
> Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, HDFS-14854.002.patch, 
> HDFS-14854.003.patch, HDFS-14854.004.patch, HDFS-14854.005.patch, 
> HDFS-14854.006.patch, HDFS-14854.007.patch, HDFS-14854.008.patch, 
> HDFS-14854.009.patch, HDFS-14854.010.patch, HDFS-14854.011.patch, 
> HDFS-14854.012.patch, HDFS-14854.013.patch, HDFS-14854.014.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces

2020-01-02 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006918#comment-17006918
 ] 

Ayush Saxena commented on HDFS-15087:
-

Does this preserve the EC Policy, ACL's etc?

> RBF: Balance/Rename across federation namespaces
> 
>
> Key: HDFS-15087
> URL: https://issues.apache.org/jira/browse/HDFS-15087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Priority: Major
> Attachments: HFR_Rename Across Federation Namespaces.pdf
>
>
> The Xiaomi storage team has developed a new feature called HFR(HDFS 
> Federation Rename) that enables us to do balance/rename across federation 
> namespaces. The idea is to first move the meta to the dst NameNode and then 
> link all the replicas. It has been working in our largest production cluster 
> for 2 months. We use it to balance the namespaces. It turns out HFR is fast 
> and flexible. The detail could be found in the design doc. 
> Looking forward to a lively discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces

2020-01-02 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006816#comment-17006816
 ] 

Jinglun commented on HDFS-15087:


In Xiaomi we have an incremental version of HFR using distcp. The idea is to 
keep submitting distcp round by round until the distcp can be done in a short 
time. Then we block all the writes and do the final round of distcp. But still 
it has weak points:
 # It's slow and can't be used in normal user rename.
 # It doubles the space so can't be used on big path.
 # The distcp needs to list the src-path and it can cost a lot of time if the 
src-path is big. It restricts the final round speed of distcp.

Listing src-path in multi-thread might resolve weak point 3.

> RBF: Balance/Rename across federation namespaces
> 
>
> Key: HDFS-15087
> URL: https://issues.apache.org/jira/browse/HDFS-15087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Priority: Major
> Attachments: HFR_Rename Across Federation Namespaces.pdf
>
>
> The Xiaomi storage team has developed a new feature called HFR(HDFS 
> Federation Rename) that enables us to do balance/rename across federation 
> namespaces. The idea is to first move the meta to the dst NameNode and then 
> link all the replicas. It has been working in our largest production cluster 
> for 2 months. We use it to balance the namespaces. It turns out HFR is fast 
> and flexible. The detail could be found in the design doc. 
> Looking forward to a lively discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15087) RBF: Balance/Rename across federation namespaces

2020-01-02 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006657#comment-17006657
 ] 

Jinglun edited comment on HDFS-15087 at 1/2/20 1:36 PM:


Hi [~elgoiri], thanks your nice comments !
{quote}Would it be possible to leverage HDFS snapshots instead of blocking 
writes and having the new tree related calls? Intuitively, I would expect for 
snapshots to cover 90% of the features described in the doc. I would try to 
improve snapshots to cover 100%.
{quote}
-I'm not familiar with the snapshot. In my rough thought as long as the 
snapshot meta could be transferred and rebuilt the HFR could support it.- -I'll 
try to write a demo to transfer and rebuild the snapshot across NameNodes-.

I have a quick look of snapshot and I'm not sure how to use it. Do you mean to 
use the diff of snapshots so we can do the balance in an incremental way ?
{quote}The approach described in the doc requires hard linking. I think this is 
a good idea for the start but I would push to make it pluggable/abstract so in 
the future we can have other implementations.
{quote}
Good idea. The design of HFR has considered it. The HFR is a combination of 
many tasks. Each task is plugable. For example if we want to use copy instead 
of hardlink, we can switch the HardLink task to a CopyReplica task.
{quote}Is hard linking available in Windows?
{quote}
After HADOOP-11483 we use jdk Files.createLink() to do the hardlinks. I test 
Files.createLink() on windows and it works.

See java doc [https://docs.oracle.com/javase/tutorial/essential/io/links.html]


was (Author: lijinglun):
Hi [~elgoiri], thanks your nice comments !
{quote}Would it be possible to leverage HDFS snapshots instead of blocking 
writes and having the new tree related calls? Intuitively, I would expect for 
snapshots to cover 90% of the features described in the doc. I would try to 
improve snapshots to cover 100%.
{quote}
I'm not familiar with the snapshot. In my rough thought as long as the snapshot 
meta could be transferred and rebuilt the HFR could support it. I'll try to 
write a demo to transfer and rebuild the snapshot across NameNodes.
{quote}The approach described in the doc requires hard linking. I think this is 
a good idea for the start but I would push to make it pluggable/abstract so in 
the future we can have other implementations.
{quote}
Good idea. The design of HFR has considered it. The HFR is a combination of 
many tasks. Each task is plugable. For example if we want to use copy instead 
of hardlink, we can switch the HardLink task to a CopyReplica task.
{quote}Is hard linking available in Windows?
{quote}
After HADOOP-11483 we use jdk Files.createLink() to do the hardlinks. I test 
Files.createLink() on windows and it works.

See java doc [https://docs.oracle.com/javase/tutorial/essential/io/links.html]

> RBF: Balance/Rename across federation namespaces
> 
>
> Key: HDFS-15087
> URL: https://issues.apache.org/jira/browse/HDFS-15087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Priority: Major
> Attachments: HFR_Rename Across Federation Namespaces.pdf
>
>
> The Xiaomi storage team has developed a new feature called HFR(HDFS 
> Federation Rename) that enables us to do balance/rename across federation 
> namespaces. The idea is to first move the meta to the dst NameNode and then 
> link all the replicas. It has been working in our largest production cluster 
> for 2 months. We use it to balance the namespaces. It turns out HFR is fast 
> and flexible. The detail could be found in the design doc. 
> Looking forward to a lively discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15092) TestRedudantBlocks#testProcessOverReplicatedAndRedudantBlock sometimes failed

2020-01-02 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006766#comment-17006766
 ] 

Íñigo Goiri commented on HDFS-15092:


Any chance we can use GenericTestUtils#waitFor? 

> TestRedudantBlocks#testProcessOverReplicatedAndRedudantBlock sometimes failed
> -
>
> Key: HDFS-15092
> URL: https://issues.apache.org/jira/browse/HDFS-15092
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.3.0
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Minor
> Attachments: HDFS-15092.001.patch
>
>
> TestRedudantBlocks#testProcessOverReplicatedAndRedudantBlock sometimes failed
> {quote}
> java.lang.AssertionError: 
> Expected :5
> Actual   :4
>  
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestRedudantBlocks.testProcessOverReplicatedAndRedudantBlock(TestRedudantBlocks.java:138)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {quote}
> Maybe we should increase sleep time



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15092) TestRedudantBlocks#testProcessOverReplicatedAndRedudantBlock sometimes failed

2020-01-02 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006757#comment-17006757
 ] 

Surendra Singh Lilhore commented on HDFS-15092:
---

+1 LGTM, I feel it is failing only in some slow machines.

 

> TestRedudantBlocks#testProcessOverReplicatedAndRedudantBlock sometimes failed
> -
>
> Key: HDFS-15092
> URL: https://issues.apache.org/jira/browse/HDFS-15092
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.3.0
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Minor
> Attachments: HDFS-15092.001.patch
>
>
> TestRedudantBlocks#testProcessOverReplicatedAndRedudantBlock sometimes failed
> {quote}
> java.lang.AssertionError: 
> Expected :5
> Actual   :4
>  
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestRedudantBlocks.testProcessOverReplicatedAndRedudantBlock(TestRedudantBlocks.java:138)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {quote}
> Maybe we should increase sleep time



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15091) Cache Admin and Quota Commands Should Check SuperUser Before Taking Lock

2020-01-02 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006735#comment-17006735
 ] 

Xiaoqiao He commented on HDFS-15091:


v02 LGTM, +1 from my side.

> Cache Admin and Quota Commands Should Check SuperUser Before Taking Lock
> 
>
> Key: HDFS-15091
> URL: https://issues.apache.org/jira/browse/HDFS-15091
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15091-01.patch, HDFS-15091-02.patch
>
>
> As of now all API check superuser before taking lock, Similarly can be done 
> for the cache commands and setQuota.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces

2020-01-02 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006662#comment-17006662
 ] 

Jinglun commented on HDFS-15087:


Hi [~linyiqun], thanks your nice comments !
{quote}So how can we ensure that source directory not being changed during that 
time? Or we recommend use HRF only for small paths that won't have frequent 
change? 
{quote}
A simple way to ensure the directory not being changed is: remove all 
permissions of the source directory and force recoverLease()/close all open 
files. Normal users can't change the source directory anymore, both directories 
and files. They can read it too.

In Xiaomi we also developed a lock technique called INodeLock. We can set an 
xattribute to one INode. The xattribute records a set of prohibited operations 
and the scope. When one rpc arrives, the NameNode check it and reject the rpc 
trying prohibited operations on path in scope. We want this INodeLock because 
we want only the write operations to be rejected.

> RBF: Balance/Rename across federation namespaces
> 
>
> Key: HDFS-15087
> URL: https://issues.apache.org/jira/browse/HDFS-15087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Priority: Major
> Attachments: HFR_Rename Across Federation Namespaces.pdf
>
>
> The Xiaomi storage team has developed a new feature called HFR(HDFS 
> Federation Rename) that enables us to do balance/rename across federation 
> namespaces. The idea is to first move the meta to the dst NameNode and then 
> link all the replicas. It has been working in our largest production cluster 
> for 2 months. We use it to balance the namespaces. It turns out HFR is fast 
> and flexible. The detail could be found in the design doc. 
> Looking forward to a lively discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces

2020-01-02 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006660#comment-17006660
 ] 

Jinglun commented on HDFS-15087:


Hi [~ayushtkn], thanks your nice comments ! Yes, FastCopy is a very good tool. 
We researched it before we started HFR. It could be very effective when we do 
balance. But it's too heavyweight if we want to support a normal rename across 
namespaces. It depends on Yarn hence the time cost is out of control. The 
saveTree()+graftTree()+hardlink way is more lightweight. In our practice even 
TB path rename could be controlled within one minute, so the rpc won't timeout.

> RBF: Balance/Rename across federation namespaces
> 
>
> Key: HDFS-15087
> URL: https://issues.apache.org/jira/browse/HDFS-15087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Priority: Major
> Attachments: HFR_Rename Across Federation Namespaces.pdf
>
>
> The Xiaomi storage team has developed a new feature called HFR(HDFS 
> Federation Rename) that enables us to do balance/rename across federation 
> namespaces. The idea is to first move the meta to the dst NameNode and then 
> link all the replicas. It has been working in our largest production cluster 
> for 2 months. We use it to balance the namespaces. It turns out HFR is fast 
> and flexible. The detail could be found in the design doc. 
> Looking forward to a lively discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces

2020-01-02 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006657#comment-17006657
 ] 

Jinglun commented on HDFS-15087:


Hi [~elgoiri], thanks your nice comments !
{quote}Would it be possible to leverage HDFS snapshots instead of blocking 
writes and having the new tree related calls? Intuitively, I would expect for 
snapshots to cover 90% of the features described in the doc. I would try to 
improve snapshots to cover 100%.
{quote}
I'm not familiar with the snapshot. In my rough thought as long as the snapshot 
meta could be transferred and rebuilt the HFR could support it. I'll try to 
write a demo to transfer and rebuild the snapshot across NameNodes.
{quote}The approach described in the doc requires hard linking. I think this is 
a good idea for the start but I would push to make it pluggable/abstract so in 
the future we can have other implementations.
{quote}
Good idea. The design of HFR has considered it. The HFR is a combination of 
many tasks. Each task is plugable. For example if we want to use copy instead 
of hardlink, we can switch the HardLink task to a CopyReplica task.
{quote}Is hard linking available in Windows?
{quote}
After HADOOP-11483 we use jdk Files.createLink() to do the hardlinks. I test 
Files.createLink() on windows and it works.

See java doc [https://docs.oracle.com/javase/tutorial/essential/io/links.html]

> RBF: Balance/Rename across federation namespaces
> 
>
> Key: HDFS-15087
> URL: https://issues.apache.org/jira/browse/HDFS-15087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Priority: Major
> Attachments: HFR_Rename Across Federation Namespaces.pdf
>
>
> The Xiaomi storage team has developed a new feature called HFR(HDFS 
> Federation Rename) that enables us to do balance/rename across federation 
> namespaces. The idea is to first move the meta to the dst NameNode and then 
> link all the replicas. It has been working in our largest production cluster 
> for 2 months. We use it to balance the namespaces. It turns out HFR is fast 
> and flexible. The detail could be found in the design doc. 
> Looking forward to a lively discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org