[jira] [Commented] (HDFS-13782) ObserverReadProxyProvider should work with IPFailoverProxyProvider

2018-08-25 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592734#comment-16592734
 ] 

Konstantin Shvachko commented on HDFS-13782:


I just committed this.

> ObserverReadProxyProvider should work with IPFailoverProxyProvider
> --
>
> Key: HDFS-13782
> URL: https://issues.apache.org/jira/browse/HDFS-13782
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: HDFS-12943
>
> Attachments: HDFS-13782-HDFS-12943.001.patch, 
> HDFS-13782-HDFS-12943.002.patch
>
>
> Currently {{ObserverReadProxyProvider}} is based on 
> {{ConfiguredFailoverProxyProvider}}. We should also be able perform SBN reads 
> in case of {{IPFailoverProxyProvider}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13782) ObserverReadProxyProvider should work with IPFailoverProxyProvider

2018-08-25 Thread Konstantin Shvachko (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-13782:
---
Release Note:   (was: I just committed this.)

> ObserverReadProxyProvider should work with IPFailoverProxyProvider
> --
>
> Key: HDFS-13782
> URL: https://issues.apache.org/jira/browse/HDFS-13782
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: HDFS-12943
>
> Attachments: HDFS-13782-HDFS-12943.001.patch, 
> HDFS-13782-HDFS-12943.002.patch
>
>
> Currently {{ObserverReadProxyProvider}} is based on 
> {{ConfiguredFailoverProxyProvider}}. We should also be able perform SBN reads 
> in case of {{IPFailoverProxyProvider}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-13782) ObserverReadProxyProvider should work with IPFailoverProxyProvider

2018-08-25 Thread Konstantin Shvachko (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-13782.

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-12943
 Release Note: I just committed this.

> ObserverReadProxyProvider should work with IPFailoverProxyProvider
> --
>
> Key: HDFS-13782
> URL: https://issues.apache.org/jira/browse/HDFS-13782
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: HDFS-12943
>
> Attachments: HDFS-13782-HDFS-12943.001.patch, 
> HDFS-13782-HDFS-12943.002.patch
>
>
> Currently {{ObserverReadProxyProvider}} is based on 
> {{ConfiguredFailoverProxyProvider}}. We should also be able perform SBN reads 
> in case of {{IPFailoverProxyProvider}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13836) RBF: To handle the exception when the mounttable znode have null value.

2018-08-25 Thread yanghuafeng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592616#comment-16592616
 ] 

yanghuafeng edited comment on HDFS-13836 at 8/25/18 3:10 PM:
-

I have found that it may be better to manage null value in the 
ZkCuratorManager.getString(). But the method getString() has throw an 
exception, including the NPE. In the StateStoreZooKeeperImpl.get(), we have 
caught the Exception but we just log the error not to delete the corrupted 
znode. In general we also in the catch clause judge the NPE and delete the 
anode. Now we just judge the null in advance. Compareing to manage NPE in the  
ZkCuratorManager  I am not sure which is better.

{code:java}
try{
 String path = getNodePath(znode, child);
 Stat stat = new Stat();
 String data = zkManager.getStringData(path, stat);
..
} catch (Exception e) {
  LOG.error("Cannot get data for {}: {}", child, e.getMessage());
}
{code}


was (Author: hfyang20071):
I have found that it may be better to manage null value in the 
ZkCuratorManager.getString(). But the method getString() has throw an 
exception, including the NPE. In the StateStoreZooKeeperImpl.get(), we have 
caught the Exception but we just log the error not to delete the corrupted 
znode. In general we also in the catch clause judge the NPE and delete the 
anode. Now we just judge the null in advance. So I am not sure which is better.

{code:java}
try{
 String path = getNodePath(znode, child);
 Stat stat = new Stat();
 String data = zkManager.getStringData(path, stat);
..
} catch (Exception e) {
  LOG.error("Cannot get data for {}: {}", child, e.getMessage());
}
{code}

> RBF: To handle the exception when the mounttable znode have null value.
> ---
>
> Key: HDFS-13836
> URL: https://issues.apache.org/jira/browse/HDFS-13836
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation, hdfs
>Affects Versions: 3.1.0
>Reporter: yanghuafeng
>Assignee: yanghuafeng
>Priority: Major
> Fix For: 2.9.0, 3.0.0, 3.1.0, 3.2.0
>
> Attachments: HDFS-13836.001.patch, HDFS-13836.002.patch, 
> HDFS-13836.003.patch, HDFS-13836.004.patch
>
>
> When we are adding the mounttable entry, the router sever is terminated. 
> Some error messages show in log, as follow:
>  2018-08-20 14:18:32,404 ERROR 
> org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreZooKeeperImpl:
>  Cannot get data for 0SLASH0testzk: null. 
> The reason is that router server have created the znode but not to set data 
> before being terminated. But the method zkManager.getStringData(path, stat) 
> will throw NPE if the path has null value in the StateStoreZooKeeperImpl, 
> leading to fail in adding the same mounttable entry and deleting the existing 
> znode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13836) RBF: To handle the exception when the mounttable znode have null value.

2018-08-25 Thread yanghuafeng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592616#comment-16592616
 ] 

yanghuafeng commented on HDFS-13836:


I have found that it may be better to manage null value in the 
ZkCuratorManager.getString(). But the method getString() has throw an 
exception, including the NPE. In the StateStoreZooKeeperImpl.get(), we have 
caught the Exception but we just log the error not to delete the corrupted 
znode. In general we also in the catch clause judge the NPE and delete the 
anode. Now we just judge the null in advance. So I am not sure which is better.

{code:java}
try{
 String path = getNodePath(znode, child);
 Stat stat = new Stat();
 String data = zkManager.getStringData(path, stat);
..
} catch (Exception e) {
  LOG.error("Cannot get data for {}: {}", child, e.getMessage());
}
{code}

> RBF: To handle the exception when the mounttable znode have null value.
> ---
>
> Key: HDFS-13836
> URL: https://issues.apache.org/jira/browse/HDFS-13836
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation, hdfs
>Affects Versions: 3.1.0
>Reporter: yanghuafeng
>Assignee: yanghuafeng
>Priority: Major
> Fix For: 2.9.0, 3.0.0, 3.1.0, 3.2.0
>
> Attachments: HDFS-13836.001.patch, HDFS-13836.002.patch, 
> HDFS-13836.003.patch, HDFS-13836.004.patch
>
>
> When we are adding the mounttable entry, the router sever is terminated. 
> Some error messages show in log, as follow:
>  2018-08-20 14:18:32,404 ERROR 
> org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreZooKeeperImpl:
>  Cannot get data for 0SLASH0testzk: null. 
> The reason is that router server have created the znode but not to set data 
> before being terminated. But the method zkManager.getStringData(path, stat) 
> will throw NPE if the path has null value in the StateStoreZooKeeperImpl, 
> leading to fail in adding the same mounttable entry and deleting the existing 
> znode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13844) Refactor the fmt_bytes function in the dfs-dust.js.

2018-08-25 Thread yanghuafeng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592600#comment-16592600
 ] 

yanghuafeng commented on HDFS-13844:


It is just an example because it is difficult to get the capacity in our 
environment with last unit ZB. We can decrease the units and simulate one 
situation overflowing the unit to explain this problem.

> Refactor the fmt_bytes function in the dfs-dust.js.
> ---
>
> Key: HDFS-13844
> URL: https://issues.apache.org/jira/browse/HDFS-13844
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, ui
>Affects Versions: 1.2.0, 2.2.0, 2.7.2, 3.0.0, 3.1.0
>Reporter: yanghuafeng
>Assignee: yanghuafeng
>Priority: Minor
> Attachments: HDFS-13844.001.patch, overflow_undefined_unit.jpg, 
> overflow_unit.jpg, undefined_unit.jpg
>
>
> The namenode WebUI cannot display the capacity with correct units. I have 
> found that the function fmt_bytes in the dfs-dust.js missed the EB unit. This 
> will lead to undefined unit in the ui.
> And although the unit ZB is very large, we should take the unit overflow into 
> consideration. Supposing the last unit is GB, we should get the 8192 GB with 
> the total capacity 8T rather than 8 undefined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13854) RBF: The ProcessingAvgTime and ProxyAvgTime should display by JMX with ms unit.

2018-08-25 Thread yanghuafeng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592596#comment-16592596
 ] 

yanghuafeng commented on HDFS-13854:


It is not wrong just improve to  unify the time unit. In the previous code 
nanoseconds is not better to display in the jmx. As we provide the toMS() to 
transfer the unit and the nanoseconds is really not essential, is it best to 
unify the proxy time unit with ms when storing?

> RBF: The ProcessingAvgTime and ProxyAvgTime should display by JMX with ms 
> unit.
> ---
>
> Key: HDFS-13854
> URL: https://issues.apache.org/jira/browse/HDFS-13854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: federation, hdfs
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
>Reporter: yanghuafeng
>Assignee: yanghuafeng
>Priority: Major
> Attachments: HDFS-13854.001.patch, HDFS-13854.002.patch, 
> ganglia_jmx_compare1.jpg, ganglia_jmx_compare2.jpg
>
>
> In the FederationRPCMetrics, proxy time and processing time should be exposed 
> to the jmx or ganglia with ms units. Although the method toMS() exists, we 
> cannot get the correct proxy time and processing time by jmx and ganglia.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-247) Handle CLOSED_CONTAINER_IO exception in ozoneClient

2018-08-25 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592566#comment-16592566
 ] 

genericqa commented on HDDS-247:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m  
2s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
59s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
28s{color} | {color:green} client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
37s{color} | {color:green} common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
32s{color} | {color:green} client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 15s{color} 
| {color:red} integration-test in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}118m 13s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.web.client.TestKeys |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDDS-247 |
| JIRA Patch URL | 
htt

[jira] [Commented] (HDFS-13830) Backport HDFS-13141 to branch-3.0: WebHDFS: Add support for getting snasphottable directory list

2018-08-25 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592562#comment-16592562
 ] 

genericqa commented on HDFS-13830:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m  
0s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.0 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
18s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
12s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
44s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
5s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
16s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
34s{color} | {color:green} branch-3.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 
19s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 43s{color} | {color:orange} root: The patch generated 2 new + 281 unchanged 
- 2 fixed = 283 total (was 283) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 35s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m  
0s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
38s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 21s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}217m 30s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints |
|   | hadoop.hdfs.TestLeaseRecovery2 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:1776208 |
| JIRA Issue | HDFS-13830 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12937128/HDFS-13830.branch-3.0.004.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 36d705a03007 4.4.0-133-generic #159-Ubuntu SMP 

[jira] [Commented] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2018-08-25 Thread lindongdong (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592561#comment-16592561
 ] 

lindongdong commented on HDFS-13671:


Hi, [~kihwal], how is the revert work? 

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Priority: Major
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-247) Handle CLOSED_CONTAINER_IO exception in ozoneClient

2018-08-25 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592540#comment-16592540
 ] 

Shashikant Banerjee commented on HDDS-247:
--

Thanks [~msingh], for the review .patch v11 addresses your review comments.

> Handle CLOSED_CONTAINER_IO exception in ozoneClient
> ---
>
> Key: HDDS-247
> URL: https://issues.apache.org/jira/browse/HDDS-247
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-247.00.patch, HDDS-247.01.patch, HDDS-247.02.patch, 
> HDDS-247.03.patch, HDDS-247.04.patch, HDDS-247.05.patch, HDDS-247.06.patch, 
> HDDS-247.07.patch, HDDS-247.08.patch, HDDS-247.09.patch, HDDS-247.10.patch, 
> HDDS-247.11.patch
>
>
> In case of ongoing writes by Ozone client to a container, the container might 
> get closed on the Datanodes because of node loss, out of space issues etc. In 
> such cases, the operation will fail with CLOSED_CONTAINER_IO exception. In 
> cases as such, ozone client should try to get the committed length of the 
> block from the Datanodes, and update the OM. This Jira aims  to address this 
> issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-247) Handle CLOSED_CONTAINER_IO exception in ozoneClient

2018-08-25 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-247:
-
Attachment: HDDS-247.11.patch

> Handle CLOSED_CONTAINER_IO exception in ozoneClient
> ---
>
> Key: HDDS-247
> URL: https://issues.apache.org/jira/browse/HDDS-247
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-247.00.patch, HDDS-247.01.patch, HDDS-247.02.patch, 
> HDDS-247.03.patch, HDDS-247.04.patch, HDDS-247.05.patch, HDDS-247.06.patch, 
> HDDS-247.07.patch, HDDS-247.08.patch, HDDS-247.09.patch, HDDS-247.10.patch, 
> HDDS-247.11.patch
>
>
> In case of ongoing writes by Ozone client to a container, the container might 
> get closed on the Datanodes because of node loss, out of space issues etc. In 
> such cases, the operation will fail with CLOSED_CONTAINER_IO exception. In 
> cases as such, ozone client should try to get the committed length of the 
> block from the Datanodes, and update the OM. This Jira aims  to address this 
> issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-247) Handle CLOSED_CONTAINER_IO exception in ozoneClient

2018-08-25 Thread Mukul Kumar Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592528#comment-16592528
 ] 

Mukul Kumar Singh commented on HDDS-247:


Thanks for working on this [~shashikant]. The latest patch looks really good to 
me. Some really minor comments. I am +1 on the patch after that.

1) ChunkGroupOutputStream:259,260,276,277,396,648 this is an unrelated change
2) ChunkGroupOutputStream:292-300, the TODO should be moved inside 
handleCloseContainerException
3) ChunkGroupOutputStream:631 setCurrentPosition is not used, can we remove 
this ?
4) ChunkOutputStream:113-114 unrelated change.
5) TestCloseContainerHandlingByClient#validateData, the input stream is not 
closed here
5) TestCloseContainerHandlingByClient#waitForContainerClose, the wait for close 
loop should be another loop. this will help in closing multiple container in 
one iteration faster.
6) TestCloseContainerHandlingByClient#95, I feel fixedLengthString can be 
removed and replaced with RandomStringUtils.random() to generate keydata, this 
will help in validating with random data.
7)TestOmBlockVersioning, wildcard import


> Handle CLOSED_CONTAINER_IO exception in ozoneClient
> ---
>
> Key: HDDS-247
> URL: https://issues.apache.org/jira/browse/HDDS-247
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-247.00.patch, HDDS-247.01.patch, HDDS-247.02.patch, 
> HDDS-247.03.patch, HDDS-247.04.patch, HDDS-247.05.patch, HDDS-247.06.patch, 
> HDDS-247.07.patch, HDDS-247.08.patch, HDDS-247.09.patch, HDDS-247.10.patch
>
>
> In case of ongoing writes by Ozone client to a container, the container might 
> get closed on the Datanodes because of node loss, out of space issues etc. In 
> such cases, the operation will fail with CLOSED_CONTAINER_IO exception. In 
> cases as such, ozone client should try to get the committed length of the 
> block from the Datanodes, and update the OM. This Jira aims  to address this 
> issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13830) Backport HDFS-13141 to branch-3.0: WebHDFS: Add support for getting snasphottable directory list

2018-08-25 Thread Siyao Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592519#comment-16592519
 ] 

Siyao Meng commented on HDFS-13830:
---

Thanks [~jojochuang] for the comment.

Removed HDFS-13280 patch in rev 004.

> Backport HDFS-13141 to branch-3.0: WebHDFS: Add support for getting 
> snasphottable directory list
> 
>
> Key: HDFS-13830
> URL: https://issues.apache.org/jira/browse/HDFS-13830
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 3.0.3
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Attachments: HDFS-13830.branch-3.0.001.patch, 
> HDFS-13830.branch-3.0.002.patch, HDFS-13830.branch-3.0.003.patch, 
> HDFS-13830.branch-3.0.004.patch
>
>
> HDFS-13141 conflicts with 3.0.3 because of interface change in HdfsFileStatus.
> This Jira aims to backport the WebHDFS getSnapshottableDirListing() support 
> to branch-3.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13830) Backport HDFS-13141 to branch-3.0: WebHDFS: Add support for getting snasphottable directory list

2018-08-25 Thread Siyao Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDFS-13830:
--
Attachment: HDFS-13830.branch-3.0.004.patch
Status: Patch Available  (was: In Progress)

> Backport HDFS-13141 to branch-3.0: WebHDFS: Add support for getting 
> snasphottable directory list
> 
>
> Key: HDFS-13830
> URL: https://issues.apache.org/jira/browse/HDFS-13830
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 3.0.3
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Attachments: HDFS-13830.branch-3.0.001.patch, 
> HDFS-13830.branch-3.0.002.patch, HDFS-13830.branch-3.0.003.patch, 
> HDFS-13830.branch-3.0.004.patch
>
>
> HDFS-13141 conflicts with 3.0.3 because of interface change in HdfsFileStatus.
> This Jira aims to backport the WebHDFS getSnapshottableDirListing() support 
> to branch-3.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13830) Backport HDFS-13141 to branch-3.0: WebHDFS: Add support for getting snasphottable directory list

2018-08-25 Thread Siyao Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDFS-13830:
--
Status: In Progress  (was: Patch Available)

> Backport HDFS-13141 to branch-3.0: WebHDFS: Add support for getting 
> snasphottable directory list
> 
>
> Key: HDFS-13830
> URL: https://issues.apache.org/jira/browse/HDFS-13830
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 3.0.3
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Attachments: HDFS-13830.branch-3.0.001.patch, 
> HDFS-13830.branch-3.0.002.patch, HDFS-13830.branch-3.0.003.patch
>
>
> HDFS-13141 conflicts with 3.0.3 because of interface change in HdfsFileStatus.
> This Jira aims to backport the WebHDFS getSnapshottableDirListing() support 
> to branch-3.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org