[jira] [Updated] (HDFS-13099) RBF: Use the ZooKeeper as the default State Store

2018-02-07 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-13099:
-
Summary: RBF: Use the ZooKeeper as the default State Store  (was: RBF: 
Refactor RBF relevant settings)

> RBF: Use the ZooKeeper as the default State Store
> -
>
> Key: HDFS-13099
> URL: https://issues.apache.org/jira/browse/HDFS-13099
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Attachments: HDFS-13099.001.patch, HDFS-13099.002.patch, 
> HDFS-13099.003.patch
>
>
> Currently the State Store Driver relevant settings only written in its 
> implement classes.
> {noformat}
> public class StateStoreZooKeeperImpl extends StateStoreSerializableImpl {
> ...
>   /** Configuration keys. */
>   public static final String FEDERATION_STORE_ZK_DRIVER_PREFIX =
>   DFSConfigKeys.FEDERATION_STORE_PREFIX + "driver.zk.";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH =
>   FEDERATION_STORE_ZK_DRIVER_PREFIX + "parent-path";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH_DEFAULT =
>   "/hdfs-federation";
> ..
> {noformat}
> Actually, they should be moved into class {{DFSConfigKeys}} and documented in 
> file {{hdfs-default.xml}}. This will help more users know these settings and 
> know how to use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13099) RBF: Use the ZooKeeper as the default State Store

2018-02-07 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-13099:
-
Attachment: HDFS-13099.004.patch

> RBF: Use the ZooKeeper as the default State Store
> -
>
> Key: HDFS-13099
> URL: https://issues.apache.org/jira/browse/HDFS-13099
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Attachments: HDFS-13099.001.patch, HDFS-13099.002.patch, 
> HDFS-13099.003.patch, HDFS-13099.004.patch
>
>
> Currently the State Store Driver relevant settings only written in its 
> implement classes.
> {noformat}
> public class StateStoreZooKeeperImpl extends StateStoreSerializableImpl {
> ...
>   /** Configuration keys. */
>   public static final String FEDERATION_STORE_ZK_DRIVER_PREFIX =
>   DFSConfigKeys.FEDERATION_STORE_PREFIX + "driver.zk.";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH =
>   FEDERATION_STORE_ZK_DRIVER_PREFIX + "parent-path";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH_DEFAULT =
>   "/hdfs-federation";
> ..
> {noformat}
> Actually, they should be moved into class {{DFSConfigKeys}} and documented in 
> file {{hdfs-default.xml}}. This will help more users know these settings and 
> know how to use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13099) RBF: Use the ZooKeeper as the default State Store

2018-02-07 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355112#comment-16355112
 ] 

Yiqun Lin commented on HDFS-13099:
--

Looks like the change for impl class broken the unit tests.
{noformat}
2018-02-07 04:43:43,016 [main] ERROR impl.StateStoreZooKeeperImpl 
(StateStoreZooKeeperImpl.java:initDriver(85)) - Cannot initialize the ZK 
connection
java.io.IOException: hadoop.zk.address is not configured.
at 
org.apache.hadoop.util.curator.ZKCuratorManager.start(ZKCuratorManager.java:128)
at 
org.apache.hadoop.util.curator.ZKCuratorManager.start(ZKCuratorManager.java:115)
at 
org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreZooKeeperImpl.initDriver(StateStoreZooKeeperImpl.java:83)
at 
org.apache.hadoop.hdfs.server.federation.store.driver.StateStoreDriver.init(StateStoreDriver.java:74)
at 
org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreSerializableImpl.init(StateStoreSerializableImpl.java:47)
{noformat}
This seems a incompatible change. Attach the update patch.

> RBF: Use the ZooKeeper as the default State Store
> -
>
> Key: HDFS-13099
> URL: https://issues.apache.org/jira/browse/HDFS-13099
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Attachments: HDFS-13099.001.patch, HDFS-13099.002.patch, 
> HDFS-13099.003.patch, HDFS-13099.004.patch
>
>
> Currently the State Store Driver relevant settings only written in its 
> implement classes.
> {noformat}
> public class StateStoreZooKeeperImpl extends StateStoreSerializableImpl {
> ...
>   /** Configuration keys. */
>   public static final String FEDERATION_STORE_ZK_DRIVER_PREFIX =
>   DFSConfigKeys.FEDERATION_STORE_PREFIX + "driver.zk.";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH =
>   FEDERATION_STORE_ZK_DRIVER_PREFIX + "parent-path";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH_DEFAULT =
>   "/hdfs-federation";
> ..
> {noformat}
> Actually, they should be moved into class {{DFSConfigKeys}} and documented in 
> file {{hdfs-default.xml}}. This will help more users know these settings and 
> know how to use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13099) RBF: Use the ZooKeeper as the default State Store

2018-02-07 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-13099:
-
Labels: incompatible incompatibleChange  (was: )

> RBF: Use the ZooKeeper as the default State Store
> -
>
> Key: HDFS-13099
> URL: https://issues.apache.org/jira/browse/HDFS-13099
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
>  Labels: incompatible, incompatibleChange
> Attachments: HDFS-13099.001.patch, HDFS-13099.002.patch, 
> HDFS-13099.003.patch, HDFS-13099.004.patch
>
>
> Currently the State Store Driver relevant settings only written in its 
> implement classes.
> {noformat}
> public class StateStoreZooKeeperImpl extends StateStoreSerializableImpl {
> ...
>   /** Configuration keys. */
>   public static final String FEDERATION_STORE_ZK_DRIVER_PREFIX =
>   DFSConfigKeys.FEDERATION_STORE_PREFIX + "driver.zk.";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH =
>   FEDERATION_STORE_ZK_DRIVER_PREFIX + "parent-path";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH_DEFAULT =
>   "/hdfs-federation";
> ..
> {noformat}
> Actually, they should be moved into class {{DFSConfigKeys}} and documented in 
> file {{hdfs-default.xml}}. This will help more users know these settings and 
> know how to use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13099) RBF: Use the ZooKeeper as the default State Store

2018-02-07 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355112#comment-16355112
 ] 

Yiqun Lin edited comment on HDFS-13099 at 2/7/18 8:21 AM:
--

Looks like the change for impl class broken the unit tests.
{noformat}
2018-02-07 04:43:43,016 [main] ERROR impl.StateStoreZooKeeperImpl 
(StateStoreZooKeeperImpl.java:initDriver(85)) - Cannot initialize the ZK 
connection
java.io.IOException: hadoop.zk.address is not configured.
at 
org.apache.hadoop.util.curator.ZKCuratorManager.start(ZKCuratorManager.java:128)
at 
org.apache.hadoop.util.curator.ZKCuratorManager.start(ZKCuratorManager.java:115)
at 
org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreZooKeeperImpl.initDriver(StateStoreZooKeeperImpl.java:83)
at 
org.apache.hadoop.hdfs.server.federation.store.driver.StateStoreDriver.init(StateStoreDriver.java:74)
at 
org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreSerializableImpl.init(StateStoreSerializableImpl.java:47)
{noformat}
This seems a incompatible change. Attach the update patch.
[~elgoiri], it will be better to let you review again.


was (Author: linyiqun):
Looks like the change for impl class broken the unit tests.
{noformat}
2018-02-07 04:43:43,016 [main] ERROR impl.StateStoreZooKeeperImpl 
(StateStoreZooKeeperImpl.java:initDriver(85)) - Cannot initialize the ZK 
connection
java.io.IOException: hadoop.zk.address is not configured.
at 
org.apache.hadoop.util.curator.ZKCuratorManager.start(ZKCuratorManager.java:128)
at 
org.apache.hadoop.util.curator.ZKCuratorManager.start(ZKCuratorManager.java:115)
at 
org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreZooKeeperImpl.initDriver(StateStoreZooKeeperImpl.java:83)
at 
org.apache.hadoop.hdfs.server.federation.store.driver.StateStoreDriver.init(StateStoreDriver.java:74)
at 
org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreSerializableImpl.init(StateStoreSerializableImpl.java:47)
{noformat}
This seems a incompatible change. Attach the update patch.

> RBF: Use the ZooKeeper as the default State Store
> -
>
> Key: HDFS-13099
> URL: https://issues.apache.org/jira/browse/HDFS-13099
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
>  Labels: incompatible, incompatibleChange
> Attachments: HDFS-13099.001.patch, HDFS-13099.002.patch, 
> HDFS-13099.003.patch, HDFS-13099.004.patch
>
>
> Currently the State Store Driver relevant settings only written in its 
> implement classes.
> {noformat}
> public class StateStoreZooKeeperImpl extends StateStoreSerializableImpl {
> ...
>   /** Configuration keys. */
>   public static final String FEDERATION_STORE_ZK_DRIVER_PREFIX =
>   DFSConfigKeys.FEDERATION_STORE_PREFIX + "driver.zk.";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH =
>   FEDERATION_STORE_ZK_DRIVER_PREFIX + "parent-path";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH_DEFAULT =
>   "/hdfs-federation";
> ..
> {noformat}
> Actually, they should be moved into class {{DFSConfigKeys}} and documented in 
> file {{hdfs-default.xml}}. This will help more users know these settings and 
> know how to use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12636) Ozone: OzoneFileSystem: Implement seek functionality for rpc client

2018-02-07 Thread Mukul Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355126#comment-16355126
 ] 

Mukul Kumar Singh commented on HDFS-12636:
--

Thanks for working on this [~ljain],

1) OzoneFSInputStream:21, can you please remove the wildcard here and import 
the specific functions inside "java.io."
2) OzoneFSOutputStream:57, commit key should be done as part of 
outputstream.close(). any reason for the comment there ?
3) ChunkInputStream.java. Please add comments around chunkOffset to explain 
that it is the offset of a chunk inside a key.
4) ChunkGroupInputStream.java. Should we add checkNotClosed() for input stream 
as well ? Also in the implementation in ChunkGroupOutputStream, should key name 
be added as well ?
5) ChunkGroupInputStream.java, 162-164, Can you please add some comments here
7) OzoneFileSystem.java:41, Please remove the wildcards here.


> Ozone: OzoneFileSystem: Implement seek functionality for rpc client
> ---
>
> Key: HDFS-12636
> URL: https://issues.apache.org/jira/browse/HDFS-12636
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-12636-HDFS-7240.001.patch, 
> HDFS-12636-HDFS-7240.002.patch, HDFS-12636-HDFS-7240.003.patch, 
> HDFS-12636-HDFS-7240.004.patch, HDFS-12636-HDFS-7240.005.patch, 
> HDFS-12636-HDFS-7240.006.patch
>
>
> OzoneClient library provides a method to invoke both RPC as well as REST 
> based methods to ozone. This api will help in the improving both the 
> performance as well as the interface management in OzoneFileSystem.
> This jira will be used to convert the REST based calls to use this new 
> unified client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13097) [SPS]: Fix the branch review comments(Part1)

2018-02-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355128#comment-16355128
 ] 

genericqa commented on HDFS-13097:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-10285 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
38s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
38s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
35s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
43s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
30s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} HDFS-10285 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 36s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
24s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 39s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}160m 54s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13097 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909566/HDFS-13097-HDFS-10285.04.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 63ea85a67580 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality

[jira] [Commented] (HDFS-13097) [SPS]: Fix the branch review comments(Part1)

2018-02-07 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355144#comment-16355144
 ] 

Uma Maheswara Rao G commented on HDFS-13097:


Thank you for the patch. +1 on the latest patch

> [SPS]: Fix the branch review comments(Part1)
> 
>
> Key: HDFS-13097
> URL: https://issues.apache.org/jira/browse/HDFS-13097
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-13097-HDFS-10285.01.patch, 
> HDFS-13097-HDFS-10285.02.patch, HDFS-13097-HDFS-10285.03.patch, 
> HDFS-13097-HDFS-10285.04.patch
>
>
> Fix the branch review comment. Please refer HDFS-10285 to see more detailed 
> [discussion|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16347472].
> *Comment-1)*
> {quote}BlockManager
>  Shouldn’t spsMode be volatile? Although I question why it’s here.
> {quote}
> [Rakesh's reply] Agreed, will do the changes.
> *Comment-2)*
> {quote}Adding SPS methods to this class implies an unexpected coupling of the 
> SPS service to the block manager. Please move them out to prove it’s not 
> tightly coupled.
> {quote}
> [Rakesh's reply] Agreed. I'm planning to create 
> {{StoragePolicySatisfyManager}} and keep all the related apis over there.
> *Comment-5)*
> {quote}DatanodeDescriptor
>  Why use a synchronized linked list to offer/poll instead of BlockingQueue?
> {quote}
> [Rakesh's reply] Agreed, will do the changes.
> *Comment-8)*
> {quote}DFSUtil
>  DFSUtil.removeOverlapBetweenStorageTypes and {{DFSUtil.getSPSWorkMultiplier
>  }}. These aren’t generally useful methods so why are they in DFSUtil? Why 
> aren’t they in the only calling class StoragePolicySatisfier?
> {quote}
> [Rakesh's reply] Agreed, Will do the changes.
> *Comment-11)*
> {quote}HdfsServerConstants
>  The xattr is called user.hdfs.sps.xattr. Why does the xattr name actually 
> contain the word “xattr”?
> {quote}
> [Rakesh's reply] Sure, will remove “xattr” word.
> *Comment-12)*
> {quote}NameNode
>  Super trivial but using the plural pronoun “we” in this exception message is 
> odd. Changing the value isn’t a joint activity.
> For enabling or disabling storage policy satisfier, we must pass either 
> none/internal/external string value only
> {quote}
> [Rakesh's reply] oops, sorry for the mistake. Will change it.
> *Comment-15)*
> {quote}FSDirSatisfyStoragePolicyOp
>  satisfyStoragePolicy errors if the xattr is already present. Should this be 
> a no-op? A client re-requesting a storage policy correction probably 
> shouldn't fail.
> unprotectedSatisfyStoragePolicy is called prior to xattr updates, which calls 
> addSPSPathId. To avoid race conditions or inconsistent state if the xattr 
> fails, should call addSPSPathId after xattrs are successfully updated.
> inodeHasSatisfyXAttr calls getXAttrFeature then immediately shorts out if the 
> inode isn't a file. Should do file check first to avoid unnecessary 
> computation.
> In general, not fond of unnecessarily guava. Instead of 
> newArrayListWithCapacity + add, standard Arrays.asList(item) is more succinct.
> {quote}
> [Rakesh'r reply] Agreed, will do the changes.
> *Comment-16)*
> {quote}FSDirStatAndListOp
>  Not sure why javadoc was changed to add needLocation. It's already present 
> and now doubled up.
> {quote}
> [Rakesh'r reply] Agreed, will correct it.
>  
>  *Comment-18)*
> {quote}DFS_MOVER_MOVERTHREADS_DEFAULT is 1000 per DN? If the DN is 
> concurrently doing 1000 moves, it's not in a good state, disk io is probably 
> saturated, and this will only make it much worse. 10 is probably more than 
> sufficient.
> {quote}
> [Rakesh'r reply] Agreed, will make it to smaller value 10.
>   
>  *Comment-22)*
> {quote}StoragePolicySatisifier
>  Should handleException use a double-checked lock to avoid synchronization? 
> Unexpected exceptions should be a rarity, right?
>  Speaking of which, it’s not safe to ignore all Throwable in the run loop! 
> You have no idea if data structures are in a sane or consistent state.
> {quote}
> [Rakesh'r reply] Agreed, will do the changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13118) SnapshotDiffReport should provide the INode type

2018-02-07 Thread Ewan Higgs (JIRA)
Ewan Higgs created HDFS-13118:
-

 Summary: SnapshotDiffReport should provide the INode type
 Key: HDFS-13118
 URL: https://issues.apache.org/jira/browse/HDFS-13118
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Ewan Higgs


Currently the snapshot diff report will list which inodes were added, removed, 
renamed, etc. But to see what the INode actually is, we need to actually access 
the underlying snapshot - and this is cumbersome to do programmatically when 
the snapshot diff already has the information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11701) NPE from Unresolved Host causes permanent DFSInputStream failures

2018-02-07 Thread Lokesh Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDFS-11701:
---
Attachment: HDFS-11701.004.patch

> NPE from Unresolved Host causes permanent DFSInputStream failures
> -
>
> Key: HDFS-11701
> URL: https://issues.apache.org/jira/browse/HDFS-11701
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
> Environment: AWS Centos linux running HBase CDH 5.9.0 and HDFS CDH 
> 5.9.0
>Reporter: James Moore
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-11701.001.patch, HDFS-11701.002.patch, 
> HDFS-11701.003.patch, HDFS-11701.004.patch
>
>
> We recently encountered the following NPE due to the DFSInputStream storing 
> old cached block locations from hosts which could no longer resolve.
> {quote}
> Caused by: java.lang.NullPointerException
> at org.apache.hadoop.hdfs.DFSClient.isLocalAddress(DFSClient.java:1122)
> at 
> org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.getPathInfo(DomainSocketFactory.java:148)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:474)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.seekToNewSource(DFSInputStream.java:1613)
> at 
> org.apache.hadoop.fs.FSDataInputStream.seekToNewSource(FSDataInputStream.java:127)
> ~HBase related stack frames trimmed~
> {quote}
> After investigating, the DFSInputStream appears to have been open for upwards 
> of 3-4 weeks and had cached block locations from decommissioned nodes that no 
> longer resolve in DNS and had been shutdown and removed from the cluster 2 
> weeks prior.  If the DFSInputStream had refreshed its block locations from 
> the name node, it would have received alternative block locations which would 
> not contain the decommissioned data nodes.  As the above NPE leaves the 
> non-resolving data node in the list of block locations the DFSInputStream 
> never refreshes the block locations and all attempts to open a BlockReader 
> for the given blocks will fail.
> In our case, we resolved the NPE by closing and re-opening every 
> DFSInputStream in the cluster to force a purge of the block locations cache.  
> Ideally, the DFSInputStream would re-fetch all block locations for a host 
> which can't be resolved in DNS or at least the blocks requested.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11701) NPE from Unresolved Host causes permanent DFSInputStream failures

2018-02-07 Thread Lokesh Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDFS-11701:
---
Attachment: (was: HDFS-11701.004.patch)

> NPE from Unresolved Host causes permanent DFSInputStream failures
> -
>
> Key: HDFS-11701
> URL: https://issues.apache.org/jira/browse/HDFS-11701
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
> Environment: AWS Centos linux running HBase CDH 5.9.0 and HDFS CDH 
> 5.9.0
>Reporter: James Moore
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-11701.001.patch, HDFS-11701.002.patch, 
> HDFS-11701.003.patch, HDFS-11701.004.patch
>
>
> We recently encountered the following NPE due to the DFSInputStream storing 
> old cached block locations from hosts which could no longer resolve.
> {quote}
> Caused by: java.lang.NullPointerException
> at org.apache.hadoop.hdfs.DFSClient.isLocalAddress(DFSClient.java:1122)
> at 
> org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.getPathInfo(DomainSocketFactory.java:148)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:474)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.seekToNewSource(DFSInputStream.java:1613)
> at 
> org.apache.hadoop.fs.FSDataInputStream.seekToNewSource(FSDataInputStream.java:127)
> ~HBase related stack frames trimmed~
> {quote}
> After investigating, the DFSInputStream appears to have been open for upwards 
> of 3-4 weeks and had cached block locations from decommissioned nodes that no 
> longer resolve in DNS and had been shutdown and removed from the cluster 2 
> weeks prior.  If the DFSInputStream had refreshed its block locations from 
> the name node, it would have received alternative block locations which would 
> not contain the decommissioned data nodes.  As the above NPE leaves the 
> non-resolving data node in the list of block locations the DFSInputStream 
> never refreshes the block locations and all attempts to open a BlockReader 
> for the given blocks will fail.
> In our case, we resolved the NPE by closing and re-opening every 
> DFSInputStream in the cluster to force a purge of the block locations cache.  
> Ideally, the DFSInputStream would re-fetch all block locations for a host 
> which can't be resolved in DNS or at least the blocks requested.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13116) Ozone: Refactor Pipeline to have transport and container specific information

2018-02-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355241#comment-16355241
 ] 

genericqa commented on HDFS-13116:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
36s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
 8s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
32s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
58s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
50s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
10s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
2s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 43s{color} | {color:orange} hadoop-hdfs-project: The patch generated 2 new + 
0 unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
42s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}133m 44s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}213m 39s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.web.client.TestKeysRatis |
|   | hadoop.ozone.TestOzoneConfigurationFields |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.ozone.scm.TestSCMCli |
|   | hadoop.ozone.TestStorageContainerManager |
|   | hadoop.ozone.scm.container.TestContainerStateManager |
|   | hadoop.ozone.scm.node.TestContainerPlacement |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-c

[jira] [Updated] (HDFS-13097) [SPS]: Fix the branch review comments(Part1)

2018-02-07 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-13097:
---
   Resolution: Fixed
Fix Version/s: HDFS-10285
   Status: Resolved  (was: Patch Available)

I have just pushed it to branch.

> [SPS]: Fix the branch review comments(Part1)
> 
>
> Key: HDFS-13097
> URL: https://issues.apache.org/jira/browse/HDFS-13097
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Fix For: HDFS-10285
>
> Attachments: HDFS-13097-HDFS-10285.01.patch, 
> HDFS-13097-HDFS-10285.02.patch, HDFS-13097-HDFS-10285.03.patch, 
> HDFS-13097-HDFS-10285.04.patch
>
>
> Fix the branch review comment. Please refer HDFS-10285 to see more detailed 
> [discussion|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16347472].
> *Comment-1)*
> {quote}BlockManager
>  Shouldn’t spsMode be volatile? Although I question why it’s here.
> {quote}
> [Rakesh's reply] Agreed, will do the changes.
> *Comment-2)*
> {quote}Adding SPS methods to this class implies an unexpected coupling of the 
> SPS service to the block manager. Please move them out to prove it’s not 
> tightly coupled.
> {quote}
> [Rakesh's reply] Agreed. I'm planning to create 
> {{StoragePolicySatisfyManager}} and keep all the related apis over there.
> *Comment-5)*
> {quote}DatanodeDescriptor
>  Why use a synchronized linked list to offer/poll instead of BlockingQueue?
> {quote}
> [Rakesh's reply] Agreed, will do the changes.
> *Comment-8)*
> {quote}DFSUtil
>  DFSUtil.removeOverlapBetweenStorageTypes and {{DFSUtil.getSPSWorkMultiplier
>  }}. These aren’t generally useful methods so why are they in DFSUtil? Why 
> aren’t they in the only calling class StoragePolicySatisfier?
> {quote}
> [Rakesh's reply] Agreed, Will do the changes.
> *Comment-11)*
> {quote}HdfsServerConstants
>  The xattr is called user.hdfs.sps.xattr. Why does the xattr name actually 
> contain the word “xattr”?
> {quote}
> [Rakesh's reply] Sure, will remove “xattr” word.
> *Comment-12)*
> {quote}NameNode
>  Super trivial but using the plural pronoun “we” in this exception message is 
> odd. Changing the value isn’t a joint activity.
> For enabling or disabling storage policy satisfier, we must pass either 
> none/internal/external string value only
> {quote}
> [Rakesh's reply] oops, sorry for the mistake. Will change it.
> *Comment-15)*
> {quote}FSDirSatisfyStoragePolicyOp
>  satisfyStoragePolicy errors if the xattr is already present. Should this be 
> a no-op? A client re-requesting a storage policy correction probably 
> shouldn't fail.
> unprotectedSatisfyStoragePolicy is called prior to xattr updates, which calls 
> addSPSPathId. To avoid race conditions or inconsistent state if the xattr 
> fails, should call addSPSPathId after xattrs are successfully updated.
> inodeHasSatisfyXAttr calls getXAttrFeature then immediately shorts out if the 
> inode isn't a file. Should do file check first to avoid unnecessary 
> computation.
> In general, not fond of unnecessarily guava. Instead of 
> newArrayListWithCapacity + add, standard Arrays.asList(item) is more succinct.
> {quote}
> [Rakesh'r reply] Agreed, will do the changes.
> *Comment-16)*
> {quote}FSDirStatAndListOp
>  Not sure why javadoc was changed to add needLocation. It's already present 
> and now doubled up.
> {quote}
> [Rakesh'r reply] Agreed, will correct it.
>  
>  *Comment-18)*
> {quote}DFS_MOVER_MOVERTHREADS_DEFAULT is 1000 per DN? If the DN is 
> concurrently doing 1000 moves, it's not in a good state, disk io is probably 
> saturated, and this will only make it much worse. 10 is probably more than 
> sufficient.
> {quote}
> [Rakesh'r reply] Agreed, will make it to smaller value 10.
>   
>  *Comment-22)*
> {quote}StoragePolicySatisifier
>  Should handleException use a double-checked lock to avoid synchronization? 
> Unexpected exceptions should be a rarity, right?
>  Speaking of which, it’s not safe to ignore all Throwable in the run loop! 
> You have no idea if data structures are in a sane or consistent state.
> {quote}
> [Rakesh'r reply] Agreed, will do the changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-07 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13110:

Attachment: HDFS-13110-HDFS-10285-01.patch

> [SPS]: Reduce the number of APIs in NamenodeProtocol used by external 
> satisfier
> ---
>
> Key: HDFS-13110
> URL: https://issues.apache.org/jira/browse/HDFS-13110
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13110-HDFS-10285-00.patch, 
> HDFS-13110-HDFS-10285-01.patch
>
>
> This task is to address the following [~daryn]'s comments. Please refer 
> HDFS-10285 to see more detailed discussion.
> *Comment-10)*
> {quote}
> NamenodeProtocolTranslatorPB
> Most of the api changes appear unnecessary.
> IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption 
> that any and all IOEs means FNF which probably isn’t the intention during rpc 
> exceptions.
> {quote}
>  *Comment-13)*
> {quote}
> StoragePolicySatisfier
>  It appears to make back-to-back calls to hasLowRedundancyBlocks and 
> getFileInfo for every file. Haven’t fully groked the code, but if low 
> redundancy is not the common case, then it shouldn’t be called unless/until 
> needed. It looks like files that are under replicated are re-queued again?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-07 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13110:

Status: Patch Available  (was: Open)

> [SPS]: Reduce the number of APIs in NamenodeProtocol used by external 
> satisfier
> ---
>
> Key: HDFS-13110
> URL: https://issues.apache.org/jira/browse/HDFS-13110
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13110-HDFS-10285-00.patch, 
> HDFS-13110-HDFS-10285-01.patch
>
>
> This task is to address the following [~daryn]'s comments. Please refer 
> HDFS-10285 to see more detailed discussion.
> *Comment-10)*
> {quote}
> NamenodeProtocolTranslatorPB
> Most of the api changes appear unnecessary.
> IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption 
> that any and all IOEs means FNF which probably isn’t the intention during rpc 
> exceptions.
> {quote}
>  *Comment-13)*
> {quote}
> StoragePolicySatisfier
>  It appears to make back-to-back calls to hasLowRedundancyBlocks and 
> getFileInfo for every file. Haven’t fully groked the code, but if low 
> redundancy is not the common case, then it shouldn’t be called unless/until 
> needed. It looks like files that are under replicated are re-queued again?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-07 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355298#comment-16355298
 ] 

Rakesh R commented on HDFS-13110:
-

I've rebased previous patch in the latest branch code and uploaded the same to 
this jira. Appreciate reviews, thanks!

> [SPS]: Reduce the number of APIs in NamenodeProtocol used by external 
> satisfier
> ---
>
> Key: HDFS-13110
> URL: https://issues.apache.org/jira/browse/HDFS-13110
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13110-HDFS-10285-00.patch, 
> HDFS-13110-HDFS-10285-01.patch
>
>
> This task is to address the following [~daryn]'s comments. Please refer 
> HDFS-10285 to see more detailed discussion.
> *Comment-10)*
> {quote}
> NamenodeProtocolTranslatorPB
> Most of the api changes appear unnecessary.
> IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption 
> that any and all IOEs means FNF which probably isn’t the intention during rpc 
> exceptions.
> {quote}
>  *Comment-13)*
> {quote}
> StoragePolicySatisfier
>  It appears to make back-to-back calls to hasLowRedundancyBlocks and 
> getFileInfo for every file. Haven’t fully groked the code, but if low 
> redundancy is not the common case, then it shouldn’t be called unless/until 
> needed. It looks like files that are under replicated are re-queued again?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-07 Thread Gabor Bota (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota reopened HDFS-11187:
---
  Assignee: Gabor Bota  (was: Wei-Chiu Chuang)

Reopening this to add the change to branch-2

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187.001.patch, HDFS-11187.002.patch, 
> HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-07 Thread Gabor Bota (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HDFS-11187:
--
Attachment: HDFS-11187-branch-2.001.patch

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, HDFS-11187.001.patch, 
> HDFS-11187.002.patch, HDFS-11187.003.patch, HDFS-11187.004.patch, 
> HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13099) RBF: Use the ZooKeeper as the default State Store

2018-02-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355327#comment-16355327
 ] 

genericqa commented on HDFS-13099:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 19s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 20s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}121m 30s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}180m 23s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.federation.metrics.TestFederationMetrics |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13099 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909585/HDFS-13099.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 8523ee89a1ef 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e5c2fdd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22974/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22974/testReport/ |

[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-07 Thread Gabor Bota (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355329#comment-16355329
 ] 

Gabor Bota commented on HDFS-11187:
---

Adding HDFS-11187-branch-2.001.patch as a proposal for branch-2.
Please note, that FsVolumeImpl.java#addFinalizedBlock returns java.io.File in 
branch-2 instead of org.apache.hadoop.hdfs.server.datanode.ReplicaInfo in 
trunk. This is because ReplicaBuilder is not implemented branch-2. If this is a 
breaking issue, then further work is needed to apply this patch to branch-2.

Thanks,
Gabor 

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, HDFS-11187.001.patch, 
> HDFS-11187.002.patch, HDFS-11187.003.patch, HDFS-11187.004.patch, 
> HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-07 Thread Gabor Bota (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HDFS-11187:
--
Status: Patch Available  (was: Reopened)

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, HDFS-11187.001.patch, 
> HDFS-11187.002.patch, HDFS-11187.003.patch, HDFS-11187.004.patch, 
> HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13116) Ozone: Refactor Pipeline to have transport and container specific information

2018-02-07 Thread Mukul Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-13116:
-
Attachment: HDFS-13116-HDFS-7240.004.patch

> Ozone: Refactor Pipeline to have transport and container specific information
> -
>
> Key: HDFS-13116
> URL: https://issues.apache.org/jira/browse/HDFS-13116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13116-HDFS-7240.001.patch, 
> HDFS-13116-HDFS-7240.002.patch, HDFS-13116-HDFS-7240.003.patch, 
> HDFS-13116-HDFS-7240.004.patch
>
>
> Currently pipeline has information about both the container as well Transport 
> layer. This results in cases where a new pipeline (i.e. transport) 
> information is allocated for each container creation.
> This code can be refactored so that the Transport information is separated 
> from the container, then the {{Transport}} can be shared between multiple 
> pipelines/containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355414#comment-16355414
 ] 

genericqa commented on HDFS-11187:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
21s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
24s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 33s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 375 unchanged - 0 fixed = 376 total (was 375) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 40s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  1m  
8s{color} | {color:red} The patch generated 146 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 92m 55s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Unreaped Processes | hadoop-hdfs:16 |
| Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 |
|   | org.apache.hadoop.hdfs.TestRead |
|   | org.apache.hadoop.security.TestPermission |
|   | org.apache.hadoop.hdfs.web.TestWebHdfsTokens |
|   | org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream |
|   | org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade |
|   | org.apache.hadoop.hdfs.TestFileAppendRestart |
|   | org.apache.hadoop.hdfs.TestReadWhileWriting |
|   | org.apache.hadoop.hdfs.TestDFSMkdirs |
|   | org.apache.hadoop.hdfs.TestDFSOutputStream |
|   | org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithSecureHdfs |
|   | org.apache.hadoop.hdfs.web.TestWebHDFSXAttr |
|   | org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs |
|   | org.apache.hadoop.hdfs.TestDistributedFileSystem |
|   | org.apache.hadoop.hdfs.TestReplaceDatanodeFailureReplication |
|   | org.apache.hadoop.hdfs.TestDFSShell |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:17213a0 |
| JIRA Issue | HDFS-11187 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909611/HDFS-11187-branch-2.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4e7c43e67679 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2 / 3446

[jira] [Commented] (HDFS-11701) NPE from Unresolved Host causes permanent DFSInputStream failures

2018-02-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355438#comment-16355438
 ] 

genericqa commented on HDFS-11701:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
43s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  4s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
20s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}131m 54s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}204m  0s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestPread |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure210 |
|   | hadoop.hdfs.TestHFlush |
|   | 
hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerWithStripedBlocks |
|   | hadoop.hdfs.TestGetBlocks |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.TestFileAppend2 |
|   | hadoop.hdfs.TestCrcCorruption |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure090 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure170 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160 |
|   | hadoop.hdfs.TestWriteRead |
|   | hadoop.hdfs.TestErasureCodingMultipleRacks |
|   | hadoop.hdfs.TestReadStripedFileWithDecodingDeletedData |
|   | hadoop.hdfs.tools.TestDFSAdmin |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 |
|   | hadoop.hdfs.tools.TestDFSHAAdminM

[jira] [Commented] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355456#comment-16355456
 ] 

genericqa commented on HDFS-13110:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-10285 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
36s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} HDFS-10285 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 34s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 3 new + 83 unchanged - 0 fixed = 86 total (was 83) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}129m 34s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}178m 18s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.qjournal.server.TestJournalNodeSync |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.server.namenode.sps.TestBlockStorageMovementAttemptedItems |
|   | hadoop.hdfs.server.namenode.TestPersistentStoragePolicySatisfier |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13110 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909608/HDFS-13110-HDFS-10285-01.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux a01948583285 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | HDFS-10285 / 4a42e7a |
| maven | version: Apa

[jira] [Updated] (HDFS-13116) Ozone: Refactor Pipeline to have transport and container specific information

2018-02-07 Thread Mukul Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-13116:
-
Attachment: HDFS-13116-HDFS-7240.005.patch

> Ozone: Refactor Pipeline to have transport and container specific information
> -
>
> Key: HDFS-13116
> URL: https://issues.apache.org/jira/browse/HDFS-13116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13116-HDFS-7240.001.patch, 
> HDFS-13116-HDFS-7240.002.patch, HDFS-13116-HDFS-7240.003.patch, 
> HDFS-13116-HDFS-7240.004.patch, HDFS-13116-HDFS-7240.005.patch
>
>
> Currently pipeline has information about both the container as well Transport 
> layer. This results in cases where a new pipeline (i.e. transport) 
> information is allocated for each container creation.
> This code can be refactored so that the Transport information is separated 
> from the container, then the {{Transport}} can be shared between multiple 
> pipelines/containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13099) RBF: Use the ZooKeeper as the default State Store

2018-02-07 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355476#comment-16355476
 ] 

Yiqun Lin commented on HDFS-13099:
--

Attach the patch to fix  failed unit test.

> RBF: Use the ZooKeeper as the default State Store
> -
>
> Key: HDFS-13099
> URL: https://issues.apache.org/jira/browse/HDFS-13099
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
>  Labels: incompatible, incompatibleChange
> Attachments: HDFS-13099.001.patch, HDFS-13099.002.patch, 
> HDFS-13099.003.patch, HDFS-13099.004.patch
>
>
> Currently the State Store Driver relevant settings only written in its 
> implement classes.
> {noformat}
> public class StateStoreZooKeeperImpl extends StateStoreSerializableImpl {
> ...
>   /** Configuration keys. */
>   public static final String FEDERATION_STORE_ZK_DRIVER_PREFIX =
>   DFSConfigKeys.FEDERATION_STORE_PREFIX + "driver.zk.";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH =
>   FEDERATION_STORE_ZK_DRIVER_PREFIX + "parent-path";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH_DEFAULT =
>   "/hdfs-federation";
> ..
> {noformat}
> Actually, they should be moved into class {{DFSConfigKeys}} and documented in 
> file {{hdfs-default.xml}}. This will help more users know these settings and 
> know how to use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13099) RBF: Use the ZooKeeper as the default State Store

2018-02-07 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-13099:
-
Attachment: HDFS-13099.005.patch

> RBF: Use the ZooKeeper as the default State Store
> -
>
> Key: HDFS-13099
> URL: https://issues.apache.org/jira/browse/HDFS-13099
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
>  Labels: incompatible, incompatibleChange
> Attachments: HDFS-13099.001.patch, HDFS-13099.002.patch, 
> HDFS-13099.003.patch, HDFS-13099.004.patch, HDFS-13099.005.patch
>
>
> Currently the State Store Driver relevant settings only written in its 
> implement classes.
> {noformat}
> public class StateStoreZooKeeperImpl extends StateStoreSerializableImpl {
> ...
>   /** Configuration keys. */
>   public static final String FEDERATION_STORE_ZK_DRIVER_PREFIX =
>   DFSConfigKeys.FEDERATION_STORE_PREFIX + "driver.zk.";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH =
>   FEDERATION_STORE_ZK_DRIVER_PREFIX + "parent-path";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH_DEFAULT =
>   "/hdfs-federation";
> ..
> {noformat}
> Actually, they should be moved into class {{DFSConfigKeys}} and documented in 
> file {{hdfs-default.xml}}. This will help more users know these settings and 
> know how to use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13022) Block Storage: Kubernetes dynamic persistent volume provisioner

2018-02-07 Thread Mukul Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355486#comment-16355486
 ] 

Mukul Kumar Singh commented on HDFS-13022:
--

Thanks for the updated patch [~elek], The new patch looks really good. Some 
minor comments.

1) Please fix the findbugs and checkstyle issues
2) Nitpick, CBlockManager.java Please move the static imports to the section 
with static imports, this needs to be done for DynamicProvisioner.java
3) DynamicProvisioner.java#stop, should join on the watcher thread as well.
4)  We should add ASF license to json files as well ?
5) Should LICENSE.txt should also be updated as part of this change
6) I feel that the file ozone-site.xml is not needed, can we set the required 
fields in the test(TestDynamicProvisioner) ?

> Block Storage: Kubernetes dynamic persistent volume provisioner
> ---
>
> Key: HDFS-13022
> URL: https://issues.apache.org/jira/browse/HDFS-13022
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: HDFS-7240
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HDFS-13022-HDFS-7240.001.patch, 
> HDFS-13022-HDFS-7240.002.patch, HDFS-13022-HDFS-7240.003.patch, 
> HDFS-13022-HDFS-7240.004.patch
>
>
> {color:#FF}{color}
> With HDFS-13017 and HDFS-13018 the cblock/jscsi server could be used in a 
> kubernetes cluster as the backend for iscsi persistent volumes.
> Unfortunatelly we need to create all the required cblocks manually with 'hdfs 
> cblok -c user volume...' for all the Persistent Volumes.
>  
> But it could be handled with a simple optional component. An additional 
> service could listen on the kubernetes event stream. In case of new 
> PersistentVolumeClaim (where the storageClassName is cblock) the cblock 
> server could create cblock in advance AND create the persistent volume could 
> be created.
>  
> The code is very simple, and this additional component could be optional in 
> the cblock server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13117) Proposal to support writing replications to HDFS asynchronously

2018-02-07 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355520#comment-16355520
 ] 

Kihwal Lee commented on HDFS-13117:
---

During writes, the data is written to multiple nodes concurrently. A client may 
experience a slowness when one of the nodes is slow (e.g. transient I/O 
overload).  But you can encounter such a node even if you write only one copy. 
Also, a single failure will fail the write permanently, which users normally 
cannot afford.  HDFS was originally designed for batch processing, but is 
widely used for more demanding environment today.  It could be a totally wrong 
choice for your app, or it could become usable by configuration changes.  More 
analysis is needed to warrant a design change.

What is your application's write performance requirement?  What are you seeing 
in your cluster? What is the version of hadoop you are running? Did you profile 
or jstack datanodes or clients, by any chance? Does the app periodically 
sync/hsync/hflush the stream? Does streams tend to hang in the middle or at the 
end of a block?  Do you see frequent pipline breakages and recoveries with the 
slowness? What is the I/O scheduler on the datanodes?

> Proposal to support writing replications to HDFS asynchronously
> ---
>
> Key: HDFS-13117
> URL: https://issues.apache.org/jira/browse/HDFS-13117
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: xuchuanyin
>Priority: Major
>
> My initial question was as below:
> ```
> I've learned that When We write data to HDFS using the interface provided by 
> HDFS such as 'FileSystem.create', our client will block until all the blocks 
> and their replications are done. This will cause efficiency problem if we use 
> HDFS as our final data storage. And many of my colleagues write the data to 
> local disk in the main thread and copy it to HDFS in another thread. 
> Obviously, it increases the disk I/O.
>  
>    So, is there a way to optimize this usage? I don't want to increase the 
> disk I/O, neither do I want to be blocked during the writing of extra 
> replications.
>   How about writing to HDFS by specifying only one replication in the main 
> thread and set the actual number of replication in another thread? Or is 
> there any better way to do this?
> ```
>  
> So my proposal here is to support writing extra replications to HDFS 
> asynchronously. User can set a minimum replicator as acceptable number of 
> replications ( < default or expected replicator). When writing to HDFS, user 
> will only be blocked until the minimum replicator has been finished and HDFS 
> will continue to complete the extra replications in background.Since HDFS 
> will periodically check the integrity of all the replications, we can also 
> leave this work to HDFS itself.
>  
> There are ways to provide the interfaces:
> 1. Creating a series of interfaces by adding `acceptableReplication` 
> parameter to the current interfaces as below:
> ```
> Before:
> FSDataOutputStream create(Path f,
>   boolean overwrite,
>   int bufferSize,
>   short replication,
>   long blockSize
> ) throws IOException
>  
> After:
> FSDataOutputStream create(Path f,
>   boolean overwrite,
>   int bufferSize,
>   short replication,
>   short acceptableReplication, // minimum number of replication to finish 
> before return
>   long blockSize
> ) throws IOException
> ```
>  
> 2. Adding the `acceptableReplication` and `asynchronous` to the runtime (or 
> default) configuration, so user will not have to change any interface and 
> will benefit from this feature.
>  
> How do you think about this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-13105) Make hadoop proxy user changes reconfigurable in Datanode

2018-02-07 Thread Mukul Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDFS-13105.
--
Resolution: Not A Problem

As pointed by [~kihwal] & [~rajive], -refreshSuperUserGroupsConfiguration 
provides method to update proxy user information on NN.

> Make hadoop proxy user changes reconfigurable in Datanode
> -
>
> Key: HDFS-13105
> URL: https://issues.apache.org/jira/browse/HDFS-13105
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>
> Currently any changes to add/delete a new proxy user requires DN restart 
> requiring a downtime. This jira proposes to make the changes in proxy/user 
> configuration reconfiguration via that ReconfigurationProtocol so that the 
> changes can take effect without a DN restart. For details please refer 
> https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/Superusers.html.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13104) Make hadoop proxy user changes reconfigurable in Namenode

2018-02-07 Thread Mukul Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-13104:
-
Resolution: Not A Problem
Status: Resolved  (was: Patch Available)

As pointed by [~kihwal], -refreshSuperUserGroupsConfiguration provides method 
to update proxy user information on NN.

> Make hadoop proxy user changes reconfigurable in Namenode
> -
>
> Key: HDFS-13104
> URL: https://issues.apache.org/jira/browse/HDFS-13104
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Attachments: HDFS-13104.001.patch, HDFS-13104.002.patch
>
>
> Currently any changes to add/delete a new proxy user requires NN restart 
> requiring a downtime. This jira proposes to make the changes in proxy/user 
> configuration reconfiguration via that ReconfigurationProtocol so that the 
> changes can take effect without a NN restart. For details please refer 
> https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/Superusers.html.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.

2018-02-07 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355650#comment-16355650
 ] 

Erik Krogen commented on HDFS-10453:


Re: v008 patch, looks like you are using {{getPreferredBlockSize()}} instead of 
{{getNumBytes()}} , that does not seem right, was it an unintentional change?

> ReplicationMonitor thread could stuck for long time due to the race between 
> replication and delete of same file in a large cluster.
> ---
>
> Key: HDFS-10453
> URL: https://issues.apache.org/jira/browse/HDFS-10453
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 2.7.6
>
> Attachments: HDFS-10453-branch-2.001.patch, 
> HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, 
> HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, 
> HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, 
> HDFS-10453.001.patch
>
>
> ReplicationMonitor thread could stuck for long time and loss data with little 
> probability. Consider the typical scenario:
> (1) create and close a file with the default replicas(3);
> (2) increase replication (to 10) of the file.
> (3) delete the file while ReplicationMonitor is scheduling blocks belong to 
> that file for replications.
> if ReplicationMonitor stuck reappeared, NameNode will print log as:
> {code:xml}
> 2016-04-19 10:20:48,083 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> ..
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough 
> replicas: expected size is 7 but only 0 storage types can be selected 
> (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, 
> DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) All required storage types are unavailable:  
> unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> {code}
> This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) 
> process same block at the same moment.
> (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to 
> replicate and leave the global lock.
> (2) FSNamesystem#delete invoked to delete blocks then clear the reference in 
> blocksmap, needReplications, etc. the block's NumBytes will set 
> NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does 
> not need explicit ACK from the node. 
> (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to 
> chooseTargets for the same blocks and no node will be selected after traverse 
> whole cluster because  no node choice satisfy the goodness criteria 
> (remaining spaces achieve required size Long.MAX_VALUE). 
> During of stage#3 ReplicationMonitor stuck for long time, especial in a large 
> cluster. invalidateBlocks & neededReplications continues to grow and no 
> consumes. it will loss data at the worst.
> This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block 
> and remove it from neededReplications.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional command

[jira] [Comment Edited] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.

2018-02-07 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355650#comment-16355650
 ] 

Erik Krogen edited comment on HDFS-10453 at 2/7/18 4:16 PM:


Re: v008 patch, looks like you are using {{getPreferredBlockSize()}} instead of 
{{getNumBytes()}} , that does not seem right, was it an unintentional change?

Other than that I am pleased with the simplicity of the new change. This looks 
valid to go trunk~2.8 as well.


was (Author: xkrogen):
Re: v008 patch, looks like you are using {{getPreferredBlockSize()}} instead of 
{{getNumBytes()}} , that does not seem right, was it an unintentional change?

> ReplicationMonitor thread could stuck for long time due to the race between 
> replication and delete of same file in a large cluster.
> ---
>
> Key: HDFS-10453
> URL: https://issues.apache.org/jira/browse/HDFS-10453
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 2.7.6
>
> Attachments: HDFS-10453-branch-2.001.patch, 
> HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, 
> HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, 
> HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, 
> HDFS-10453.001.patch
>
>
> ReplicationMonitor thread could stuck for long time and loss data with little 
> probability. Consider the typical scenario:
> (1) create and close a file with the default replicas(3);
> (2) increase replication (to 10) of the file.
> (3) delete the file while ReplicationMonitor is scheduling blocks belong to 
> that file for replications.
> if ReplicationMonitor stuck reappeared, NameNode will print log as:
> {code:xml}
> 2016-04-19 10:20:48,083 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> ..
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough 
> replicas: expected size is 7 but only 0 storage types can be selected 
> (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, 
> DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) All required storage types are unavailable:  
> unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> {code}
> This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) 
> process same block at the same moment.
> (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to 
> replicate and leave the global lock.
> (2) FSNamesystem#delete invoked to delete blocks then clear the reference in 
> blocksmap, needReplications, etc. the block's NumBytes will set 
> NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does 
> not need explicit ACK from the node. 
> (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to 
> chooseTargets for the same blocks and no node will be selected after traverse 
> whole cluster because  no node choice satisfy the goodness criteria 
> (remaining spaces achieve required size Long.MAX_VALUE). 
> During of stage#3 ReplicationMonitor stuck for long time, especial in a large 
> cluster. invalidateBlocks & neededReplications continues to grow and no 
> consumes. it will loss data at the worst.

[jira] [Commented] (HDFS-13116) Ozone: Refactor Pipeline to have transport and container specific information

2018-02-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355659#comment-16355659
 ] 

genericqa commented on HDFS-13116:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
41s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
18s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
51s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
59s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 14s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
10s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
1s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 41s{color} | {color:orange} hadoop-hdfs-project: The patch generated 1 new + 
0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
44s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}131m 44s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}200m 33s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.ozone.web.client.TestKeys |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.cblock.TestBufferManager |
|   | hadoop.ozone.scm.container.TestContainerStateManager |
|   | hadoop.ozone.scm.TestSCMCli |
|   | hadoop.ozone.web.client.TestKeysRatis |
|   | hadoop.cblock.TestCBlockReadWrite |
|   | hadoop.ozone.TestOzoneConfigurationFields |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d11161b |
| JIRA Issue | HDFS-

[jira] [Updated] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-07 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13110:

Attachment: HDFS-13110-HDFS-02.patch

> [SPS]: Reduce the number of APIs in NamenodeProtocol used by external 
> satisfier
> ---
>
> Key: HDFS-13110
> URL: https://issues.apache.org/jira/browse/HDFS-13110
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13110-HDFS-02.patch, 
> HDFS-13110-HDFS-10285-00.patch, HDFS-13110-HDFS-10285-01.patch
>
>
> This task is to address the following [~daryn]'s comments. Please refer 
> HDFS-10285 to see more detailed discussion.
> *Comment-10)*
> {quote}
> NamenodeProtocolTranslatorPB
> Most of the api changes appear unnecessary.
> IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption 
> that any and all IOEs means FNF which probably isn’t the intention during rpc 
> exceptions.
> {quote}
>  *Comment-13)*
> {quote}
> StoragePolicySatisfier
>  It appears to make back-to-back calls to hasLowRedundancyBlocks and 
> getFileInfo for every file. Haven’t fully groked the code, but if low 
> redundancy is not the common case, then it shouldn’t be called unless/until 
> needed. It looks like files that are under replicated are re-queued again?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-07 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13110:

Attachment: (was: HDFS-13110-HDFS-02.patch)

> [SPS]: Reduce the number of APIs in NamenodeProtocol used by external 
> satisfier
> ---
>
> Key: HDFS-13110
> URL: https://issues.apache.org/jira/browse/HDFS-13110
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13110-HDFS-10285-00.patch, 
> HDFS-13110-HDFS-10285-01.patch
>
>
> This task is to address the following [~daryn]'s comments. Please refer 
> HDFS-10285 to see more detailed discussion.
> *Comment-10)*
> {quote}
> NamenodeProtocolTranslatorPB
> Most of the api changes appear unnecessary.
> IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption 
> that any and all IOEs means FNF which probably isn’t the intention during rpc 
> exceptions.
> {quote}
>  *Comment-13)*
> {quote}
> StoragePolicySatisfier
>  It appears to make back-to-back calls to hasLowRedundancyBlocks and 
> getFileInfo for every file. Haven’t fully groked the code, but if low 
> redundancy is not the common case, then it shouldn’t be called unless/until 
> needed. It looks like files that are under replicated are re-queued again?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-07 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13110:

Attachment: HDFS-13110-HDFS-10285-02.patch

> [SPS]: Reduce the number of APIs in NamenodeProtocol used by external 
> satisfier
> ---
>
> Key: HDFS-13110
> URL: https://issues.apache.org/jira/browse/HDFS-13110
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13110-HDFS-10285-00.patch, 
> HDFS-13110-HDFS-10285-01.patch, HDFS-13110-HDFS-10285-02.patch
>
>
> This task is to address the following [~daryn]'s comments. Please refer 
> HDFS-10285 to see more detailed discussion.
> *Comment-10)*
> {quote}
> NamenodeProtocolTranslatorPB
> Most of the api changes appear unnecessary.
> IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption 
> that any and all IOEs means FNF which probably isn’t the intention during rpc 
> exceptions.
> {quote}
>  *Comment-13)*
> {quote}
> StoragePolicySatisfier
>  It appears to make back-to-back calls to hasLowRedundancyBlocks and 
> getFileInfo for every file. Haven’t fully groked the code, but if low 
> redundancy is not the common case, then it shouldn’t be called unless/until 
> needed. It looks like files that are under replicated are re-queued again?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-07 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355685#comment-16355685
 ] 

Rakesh R commented on HDFS-13110:
-

Attached new patch fixing test case failures and checkstyle warnings,

> [SPS]: Reduce the number of APIs in NamenodeProtocol used by external 
> satisfier
> ---
>
> Key: HDFS-13110
> URL: https://issues.apache.org/jira/browse/HDFS-13110
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13110-HDFS-10285-00.patch, 
> HDFS-13110-HDFS-10285-01.patch, HDFS-13110-HDFS-10285-02.patch
>
>
> This task is to address the following [~daryn]'s comments. Please refer 
> HDFS-10285 to see more detailed discussion.
> *Comment-10)*
> {quote}
> NamenodeProtocolTranslatorPB
> Most of the api changes appear unnecessary.
> IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption 
> that any and all IOEs means FNF which probably isn’t the intention during rpc 
> exceptions.
> {quote}
>  *Comment-13)*
> {quote}
> StoragePolicySatisfier
>  It appears to make back-to-back calls to hasLowRedundancyBlocks and 
> getFileInfo for every file. Haven’t fully groked the code, but if low 
> redundancy is not the common case, then it shouldn’t be called unless/until 
> needed. It looks like files that are under replicated are re-queued again?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13099) RBF: Use the ZooKeeper as the default State Store

2018-02-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355747#comment-16355747
 ] 

genericqa commented on HDFS-13099:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 52s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}131m 11s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}185m  4s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestBlockStoragePolicy |
|   | hadoop.hdfs.TestLeaseRecovery2 |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13099 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909619/HDFS-13099.005.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 21b8ca2fa443 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e5c2fdd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22980/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22980/testRep

[jira] [Commented] (HDFS-13116) Ozone: Refactor Pipeline to have transport and container specific information

2018-02-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355759#comment-16355759
 ] 

genericqa commented on HDFS-13116:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
19s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
10s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
7s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
17s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
10s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 49s{color} | {color:orange} hadoop-hdfs-project: The patch generated 10 new 
+ 0 unchanged - 0 fixed = 10 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 34s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
8s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}130m 56s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}205m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.web.client.TestKeysRatis |
|   | hadoop.ozone.ozShell.TestOzoneShell |
|   | hadoop.ozone.TestOzoneConfigurationFields |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.ozone.tools.TestCorona |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.ozone.web.client.TestKeys |
|   | hadoop.ozone.scm.container.TestContainerStateManager |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-

[jira] [Commented] (HDFS-13078) Ozone: Ratis read fail because stream is closed before the reply is received

2018-02-07 Thread Mukul Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355814#comment-16355814
 ] 

Mukul Kumar Singh commented on HDFS-13078:
--

Added some debug logging and determined that the reads were failing because of
{code}
org.apache.ratis.shaded.io.grpc.StatusRuntimeException: INTERNAL: Frame size 
16777607 exceeds maximum: 4194304. If this is normal, increase the 
maxMessageSize in the channel/server builder
{code}

This fix will require RATIS-197, which provides an option to specify max 
message size for RaftClient. Once RATIS-197, this jira can be fixed as well.


> Ozone: Ratis read fail because stream is closed before the reply is received
> 
>
> Key: HDFS-13078
> URL: https://issues.apache.org/jira/browse/HDFS-13078
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
>
> In Ozone, reads from Ratis read fail because stream is closed before the 
> reply is received.
> {code}
> Jan 23, 2018 1:27:14 PM 
> org.apache.ratis.shaded.io.grpc.netty.NettyServerHandler onStreamError
> WARNING: Stream Error
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception$StreamException:
>  Stream closed before write could take place
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:149)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:499)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:480)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:105)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:349)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:985)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:941)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:497)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:503)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.closeStream(Http2ConnectionHandler.java:587)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onRstStreamRead(DefaultHttp2ConnectionDecoder.java:356)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onRstStreamRead(Http2InboundFrameLogger.java:80)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readRstStreamFrame(DefaultHttp2FrameReader.java:516)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:260)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:388)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:448)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
> at 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.apache.ratis.shaded.io.netty.cha

[jira] [Updated] (HDFS-13078) Ozone: Ratis read fail because stream is closed before the reply is received

2018-02-07 Thread Mukul Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-13078:
-
Status: Patch Available  (was: Open)

> Ozone: Ratis read fail because stream is closed before the reply is received
> 
>
> Key: HDFS-13078
> URL: https://issues.apache.org/jira/browse/HDFS-13078
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13078-HDFS-7240.001.patch
>
>
> In Ozone, reads from Ratis read fail because stream is closed before the 
> reply is received.
> {code}
> Jan 23, 2018 1:27:14 PM 
> org.apache.ratis.shaded.io.grpc.netty.NettyServerHandler onStreamError
> WARNING: Stream Error
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception$StreamException:
>  Stream closed before write could take place
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:149)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:499)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:480)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:105)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:349)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:985)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:941)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:497)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:503)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.closeStream(Http2ConnectionHandler.java:587)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onRstStreamRead(DefaultHttp2ConnectionDecoder.java:356)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onRstStreamRead(Http2InboundFrameLogger.java:80)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readRstStreamFrame(DefaultHttp2FrameReader.java:516)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:260)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:388)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:448)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
> at 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at 
> org.apache.ratis.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
> at 
> org.apac

[jira] [Updated] (HDFS-13078) Ozone: Ratis read fail because stream is closed before the reply is received

2018-02-07 Thread Mukul Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-13078:
-
Attachment: HDFS-13078-HDFS-7240.001.patch

> Ozone: Ratis read fail because stream is closed before the reply is received
> 
>
> Key: HDFS-13078
> URL: https://issues.apache.org/jira/browse/HDFS-13078
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13078-HDFS-7240.001.patch
>
>
> In Ozone, reads from Ratis read fail because stream is closed before the 
> reply is received.
> {code}
> Jan 23, 2018 1:27:14 PM 
> org.apache.ratis.shaded.io.grpc.netty.NettyServerHandler onStreamError
> WARNING: Stream Error
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception$StreamException:
>  Stream closed before write could take place
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:149)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:499)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:480)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:105)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:349)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:985)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:941)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:497)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:503)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.closeStream(Http2ConnectionHandler.java:587)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onRstStreamRead(DefaultHttp2ConnectionDecoder.java:356)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onRstStreamRead(Http2InboundFrameLogger.java:80)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readRstStreamFrame(DefaultHttp2FrameReader.java:516)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:260)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:388)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:448)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
> at 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at 
> org.apache.ratis.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
> at 
> or

[jira] [Commented] (HDFS-12935) Get ambiguous result for DFSAdmin command in HA mode when only one namenode is up

2018-02-07 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355827#comment-16355827
 ] 

Brahma Reddy Battula commented on HDFS-12935:
-

latest trunk patch lgtm, Committing shortly. will re-upload branch-2 patch to 
trigger the jenkins.

> Get ambiguous result for DFSAdmin command in HA mode when only one namenode 
> is up
> -
>
> Key: HDFS-12935
> URL: https://issues.apache.org/jira/browse/HDFS-12935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.9.0, 3.0.0-beta1, 3.0.0
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Major
> Attachments: HDFS-12935.002.patch, HDFS-12935.003.patch, 
> HDFS-12935.004.patch, HDFS-12935.005.patch, HDFS-12935.006-branch.2.patch, 
> HDFS-12935.006.patch, HDFS-12935.007-branch.2.patch, HDFS-12935.007.patch, 
> HDFS-12935.008.patch, HDFS-12935.009-branch.2.patch, HDFS-12935.009.patch, 
> HDFS_12935.001.patch
>
>
> In HA mode, if one namenode is down, most of functions can still work. When 
> considering the following two occasions:
>  (1)nn1 up and nn2 down
>  (2)nn1 down and nn2 up
> These two occasions should be equivalent. However, some of the DFSAdmin 
> commands will have ambiguous results. The commands can be send successfully 
> to the up namenode and are always functionally useful only when nn1 is up 
> regardless of exception (IOException when connecting to the down namenode 
> nn2). If only nn2 is up, the commands have no use at all and only exception 
> to connect nn1 can be found.
> See the following command "hdfs dfsadmin setBalancerBandwidth" which aim to 
> set balancer bandwidth value for datanodes as an example. It works and all 
> the datanodes can get the setting values only when nn1 is up. If only nn2 is 
> up, the command throws exception directly and no datanode get the bandwidth 
> setting. Approximately ten DFSAdmin commands use the similar logical process 
> and may be ambiguous.
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn1
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 12345
> *Balancer bandwidth is set to 12345 for jiangjianfei01/172.17.0.14:9820*
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei02:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn2
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 1234
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei01:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12935) Get ambiguous result for DFSAdmin command in HA mode when only one namenode is up

2018-02-07 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-12935:

Attachment: HDFS-12935.009-branch-2.patch

> Get ambiguous result for DFSAdmin command in HA mode when only one namenode 
> is up
> -
>
> Key: HDFS-12935
> URL: https://issues.apache.org/jira/browse/HDFS-12935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.9.0, 3.0.0-beta1, 3.0.0
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Major
> Attachments: HDFS-12935.002.patch, HDFS-12935.003.patch, 
> HDFS-12935.004.patch, HDFS-12935.005.patch, HDFS-12935.006-branch.2.patch, 
> HDFS-12935.006.patch, HDFS-12935.007-branch.2.patch, HDFS-12935.007.patch, 
> HDFS-12935.008.patch, HDFS-12935.009-branch-2.patch, 
> HDFS-12935.009-branch.2.patch, HDFS-12935.009.patch, HDFS_12935.001.patch
>
>
> In HA mode, if one namenode is down, most of functions can still work. When 
> considering the following two occasions:
>  (1)nn1 up and nn2 down
>  (2)nn1 down and nn2 up
> These two occasions should be equivalent. However, some of the DFSAdmin 
> commands will have ambiguous results. The commands can be send successfully 
> to the up namenode and are always functionally useful only when nn1 is up 
> regardless of exception (IOException when connecting to the down namenode 
> nn2). If only nn2 is up, the commands have no use at all and only exception 
> to connect nn1 can be found.
> See the following command "hdfs dfsadmin setBalancerBandwidth" which aim to 
> set balancer bandwidth value for datanodes as an example. It works and all 
> the datanodes can get the setting values only when nn1 is up. If only nn2 is 
> up, the command throws exception directly and no datanode get the bandwidth 
> setting. Approximately ten DFSAdmin commands use the similar logical process 
> and may be ambiguous.
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn1
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 12345
> *Balancer bandwidth is set to 12345 for jiangjianfei01/172.17.0.14:9820*
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei02:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn2
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 1234
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei01:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11701) NPE from Unresolved Host causes permanent DFSInputStream failures

2018-02-07 Thread Lokesh Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355838#comment-16355838
 ] 

Lokesh Jain commented on HDFS-11701:


The unit test failures are not related.

> NPE from Unresolved Host causes permanent DFSInputStream failures
> -
>
> Key: HDFS-11701
> URL: https://issues.apache.org/jira/browse/HDFS-11701
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
> Environment: AWS Centos linux running HBase CDH 5.9.0 and HDFS CDH 
> 5.9.0
>Reporter: James Moore
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-11701.001.patch, HDFS-11701.002.patch, 
> HDFS-11701.003.patch, HDFS-11701.004.patch
>
>
> We recently encountered the following NPE due to the DFSInputStream storing 
> old cached block locations from hosts which could no longer resolve.
> {quote}
> Caused by: java.lang.NullPointerException
> at org.apache.hadoop.hdfs.DFSClient.isLocalAddress(DFSClient.java:1122)
> at 
> org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.getPathInfo(DomainSocketFactory.java:148)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:474)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.seekToNewSource(DFSInputStream.java:1613)
> at 
> org.apache.hadoop.fs.FSDataInputStream.seekToNewSource(FSDataInputStream.java:127)
> ~HBase related stack frames trimmed~
> {quote}
> After investigating, the DFSInputStream appears to have been open for upwards 
> of 3-4 weeks and had cached block locations from decommissioned nodes that no 
> longer resolve in DNS and had been shutdown and removed from the cluster 2 
> weeks prior.  If the DFSInputStream had refreshed its block locations from 
> the name node, it would have received alternative block locations which would 
> not contain the decommissioned data nodes.  As the above NPE leaves the 
> non-resolving data node in the list of block locations the DFSInputStream 
> never refreshes the block locations and all attempts to open a BlockReader 
> for the given blocks will fail.
> In our case, we resolved the NPE by closing and re-opening every 
> DFSInputStream in the cluster to force a purge of the block locations cache.  
> Ideally, the DFSInputStream would re-fetch all block locations for a host 
> which can't be resolved in DNS or at least the blocks requested.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12935) Get ambiguous result for DFSAdmin command in HA mode when only one namenode is up

2018-02-07 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355839#comment-16355839
 ] 

Brahma Reddy Battula commented on HDFS-12935:
-

{{Committed to {{trunk}} and branch-3.0 .[~jiangjianfei] thanks for your 
contribution, appreciate your dedication towards close this Jira.}}

 

Re-uploaded the branch-2 patch to run jenkins.

> Get ambiguous result for DFSAdmin command in HA mode when only one namenode 
> is up
> -
>
> Key: HDFS-12935
> URL: https://issues.apache.org/jira/browse/HDFS-12935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.9.0, 3.0.0-beta1, 3.0.0
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Major
> Attachments: HDFS-12935.002.patch, HDFS-12935.003.patch, 
> HDFS-12935.004.patch, HDFS-12935.005.patch, HDFS-12935.006-branch.2.patch, 
> HDFS-12935.006.patch, HDFS-12935.007-branch.2.patch, HDFS-12935.007.patch, 
> HDFS-12935.008.patch, HDFS-12935.009-branch-2.patch, 
> HDFS-12935.009-branch.2.patch, HDFS-12935.009.patch, HDFS_12935.001.patch
>
>
> In HA mode, if one namenode is down, most of functions can still work. When 
> considering the following two occasions:
>  (1)nn1 up and nn2 down
>  (2)nn1 down and nn2 up
> These two occasions should be equivalent. However, some of the DFSAdmin 
> commands will have ambiguous results. The commands can be send successfully 
> to the up namenode and are always functionally useful only when nn1 is up 
> regardless of exception (IOException when connecting to the down namenode 
> nn2). If only nn2 is up, the commands have no use at all and only exception 
> to connect nn1 can be found.
> See the following command "hdfs dfsadmin setBalancerBandwidth" which aim to 
> set balancer bandwidth value for datanodes as an example. It works and all 
> the datanodes can get the setting values only when nn1 is up. If only nn2 is 
> up, the command throws exception directly and no datanode get the bandwidth 
> setting. Approximately ten DFSAdmin commands use the similar logical process 
> and may be ambiguous.
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn1
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 12345
> *Balancer bandwidth is set to 12345 for jiangjianfei01/172.17.0.14:9820*
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei02:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn2
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 1234
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei01:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12935) Get ambiguous result for DFSAdmin command in HA mode when only one namenode is up

2018-02-07 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355839#comment-16355839
 ] 

Brahma Reddy Battula edited comment on HDFS-12935 at 2/7/18 6:13 PM:
-

{{Committed to trunk}} and branch-3.0 .[~jiangjianfei] thanks for your 
contribution, appreciate your dedication towards close this Jira.

 

Re-uploaded the branch-2 patch to run jenkins.


was (Author: brahmareddy):
{{Committed to {{trunk}} and branch-3.0 .[~jiangjianfei] thanks for your 
contribution, appreciate your dedication towards close this Jira.}}

 

Re-uploaded the branch-2 patch to run jenkins.

> Get ambiguous result for DFSAdmin command in HA mode when only one namenode 
> is up
> -
>
> Key: HDFS-12935
> URL: https://issues.apache.org/jira/browse/HDFS-12935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.9.0, 3.0.0-beta1, 3.0.0
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Major
> Attachments: HDFS-12935.002.patch, HDFS-12935.003.patch, 
> HDFS-12935.004.patch, HDFS-12935.005.patch, HDFS-12935.006-branch.2.patch, 
> HDFS-12935.006.patch, HDFS-12935.007-branch.2.patch, HDFS-12935.007.patch, 
> HDFS-12935.008.patch, HDFS-12935.009-branch-2.patch, 
> HDFS-12935.009-branch.2.patch, HDFS-12935.009.patch, HDFS_12935.001.patch
>
>
> In HA mode, if one namenode is down, most of functions can still work. When 
> considering the following two occasions:
>  (1)nn1 up and nn2 down
>  (2)nn1 down and nn2 up
> These two occasions should be equivalent. However, some of the DFSAdmin 
> commands will have ambiguous results. The commands can be send successfully 
> to the up namenode and are always functionally useful only when nn1 is up 
> regardless of exception (IOException when connecting to the down namenode 
> nn2). If only nn2 is up, the commands have no use at all and only exception 
> to connect nn1 can be found.
> See the following command "hdfs dfsadmin setBalancerBandwidth" which aim to 
> set balancer bandwidth value for datanodes as an example. It works and all 
> the datanodes can get the setting values only when nn1 is up. If only nn2 is 
> up, the command throws exception directly and no datanode get the bandwidth 
> setting. Approximately ten DFSAdmin commands use the similar logical process 
> and may be ambiguous.
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn1
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 12345
> *Balancer bandwidth is set to 12345 for jiangjianfei01/172.17.0.14:9820*
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei02:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn2
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 1234
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei01:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13112) Token expiration edits may cause log corruption or deadlock

2018-02-07 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355844#comment-16355844
 ] 

Kihwal Lee commented on HDFS-13112:
---

The patch looks good.
- The addition of read locks ensures these edit logging activities do not 
collide with edit rolling or HA transitions(In addition to the level of safety 
provided by {{noInterruptsLock}}).
- A write lock is not required since these don't change any state other threads 
are accessing with a read lock.

And only the secret manager is edit logging with a read lock and all others are 
using a write lock, there can be no concurrent edit logging and it covers the 
general {{FSEditLog}} thread safety issue, not only the issue between logging 
and rolling.

Now, if we believe that it is only unsafe between edit logging and rolling 
(i.e. normal edit logging activities are thread safe), we could make 
{{getDelegationToken()}}, {{renewDelegationToken()}} and 
{{cancelDelegationToken()}} acquire a read lock.  And perhaps lease-related 
calls too.  Any thoughts on this?

In any case, I'm +1 on the patch. If you think we can make additional locking 
changes, please file a follow-up jira.

> Token expiration edits may cause log corruption or deadlock
> ---
>
> Key: HDFS-13112
> URL: https://issues.apache.org/jira/browse/HDFS-13112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.1.0-beta, 0.23.8
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-13112.patch
>
>
> HDFS-4477 specifically did not acquire the fsn lock during token cancellation 
> based on the belief that edit logs are thread-safe.  However, log rolling is 
> not thread-safe.  Failure to externally synchronize on the fsn lock during a 
> roll will cause problems.
> For sync edit logging, it may cause corruption by interspersing edits with 
> the end/start segment edits.  Async edit logging may encounter a deadlock if 
> the log queue overflows.  Luckily, losing the race is extremely rare.  In ~5 
> years, we've never encountered it.  However, HDFS-13051 lost the race with 
> async edits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12935) Get ambiguous result for DFSAdmin command in HA mode when only one namenode is up

2018-02-07 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-12935:

Fix Version/s: 3.0.2
   3.0.1
   3.1.0

> Get ambiguous result for DFSAdmin command in HA mode when only one namenode 
> is up
> -
>
> Key: HDFS-12935
> URL: https://issues.apache.org/jira/browse/HDFS-12935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.9.0, 3.0.0-beta1, 3.0.0
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Major
> Fix For: 3.1.0, 3.0.1, 3.0.2
>
> Attachments: HDFS-12935.002.patch, HDFS-12935.003.patch, 
> HDFS-12935.004.patch, HDFS-12935.005.patch, HDFS-12935.006-branch.2.patch, 
> HDFS-12935.006.patch, HDFS-12935.007-branch.2.patch, HDFS-12935.007.patch, 
> HDFS-12935.008.patch, HDFS-12935.009-branch-2.patch, 
> HDFS-12935.009-branch.2.patch, HDFS-12935.009.patch, HDFS_12935.001.patch
>
>
> In HA mode, if one namenode is down, most of functions can still work. When 
> considering the following two occasions:
>  (1)nn1 up and nn2 down
>  (2)nn1 down and nn2 up
> These two occasions should be equivalent. However, some of the DFSAdmin 
> commands will have ambiguous results. The commands can be send successfully 
> to the up namenode and are always functionally useful only when nn1 is up 
> regardless of exception (IOException when connecting to the down namenode 
> nn2). If only nn2 is up, the commands have no use at all and only exception 
> to connect nn1 can be found.
> See the following command "hdfs dfsadmin setBalancerBandwidth" which aim to 
> set balancer bandwidth value for datanodes as an example. It works and all 
> the datanodes can get the setting values only when nn1 is up. If only nn2 is 
> up, the command throws exception directly and no datanode get the bandwidth 
> setting. Approximately ten DFSAdmin commands use the similar logical process 
> and may be ambiguous.
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn1
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 12345
> *Balancer bandwidth is set to 12345 for jiangjianfei01/172.17.0.14:9820*
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei02:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn2
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 1234
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei01:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12935) Get ambiguous result for DFSAdmin command in HA mode when only one namenode is up

2018-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355848#comment-16355848
 ] 

Hudson commented on HDFS-12935:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13629 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13629/])
HDFS-12935. Get ambiguous result for DFSAdmin command in HA mode when (brahma: 
rev 01bd6ab18fa48f4c7cac1497905b52e547962599)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdminWithHA.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HAUtil.java


> Get ambiguous result for DFSAdmin command in HA mode when only one namenode 
> is up
> -
>
> Key: HDFS-12935
> URL: https://issues.apache.org/jira/browse/HDFS-12935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.9.0, 3.0.0-beta1, 3.0.0
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Major
> Fix For: 3.1.0, 3.0.1, 3.0.2
>
> Attachments: HDFS-12935.002.patch, HDFS-12935.003.patch, 
> HDFS-12935.004.patch, HDFS-12935.005.patch, HDFS-12935.006-branch.2.patch, 
> HDFS-12935.006.patch, HDFS-12935.007-branch.2.patch, HDFS-12935.007.patch, 
> HDFS-12935.008.patch, HDFS-12935.009-branch-2.patch, 
> HDFS-12935.009-branch.2.patch, HDFS-12935.009.patch, HDFS_12935.001.patch
>
>
> In HA mode, if one namenode is down, most of functions can still work. When 
> considering the following two occasions:
>  (1)nn1 up and nn2 down
>  (2)nn1 down and nn2 up
> These two occasions should be equivalent. However, some of the DFSAdmin 
> commands will have ambiguous results. The commands can be send successfully 
> to the up namenode and are always functionally useful only when nn1 is up 
> regardless of exception (IOException when connecting to the down namenode 
> nn2). If only nn2 is up, the commands have no use at all and only exception 
> to connect nn1 can be found.
> See the following command "hdfs dfsadmin setBalancerBandwidth" which aim to 
> set balancer bandwidth value for datanodes as an example. It works and all 
> the datanodes can get the setting values only when nn1 is up. If only nn2 is 
> up, the command throws exception directly and no datanode get the bandwidth 
> setting. Approximately ten DFSAdmin commands use the similar logical process 
> and may be ambiguous.
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn1
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 12345
> *Balancer bandwidth is set to 12345 for jiangjianfei01/172.17.0.14:9820*
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei02:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# hdfs haadmin -getServiceState nn2
> active
> [root@jiangjianfei01 ~]# hdfs dfsadmin -setBalancerBandwidth 1234
> setBalancerBandwidth: Call From jiangjianfei01/172.17.0.14 to 
> jiangjianfei01:9820 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> [root@jiangjianfei01 ~]# 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13116) Ozone: Refactor Pipeline to have transport and container specific information

2018-02-07 Thread Mukul Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-13116:
-
Attachment: HDFS-13116-HDFS-7240.006.patch

> Ozone: Refactor Pipeline to have transport and container specific information
> -
>
> Key: HDFS-13116
> URL: https://issues.apache.org/jira/browse/HDFS-13116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13116-HDFS-7240.001.patch, 
> HDFS-13116-HDFS-7240.002.patch, HDFS-13116-HDFS-7240.003.patch, 
> HDFS-13116-HDFS-7240.004.patch, HDFS-13116-HDFS-7240.005.patch, 
> HDFS-13116-HDFS-7240.006.patch
>
>
> Currently pipeline has information about both the container as well Transport 
> layer. This results in cases where a new pipeline (i.e. transport) 
> information is allocated for each container creation.
> This code can be refactored so that the Transport information is separated 
> from the container, then the {{Transport}} can be shared between multiple 
> pipelines/containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12990) Change default NameNode RPC port back to 8020

2018-02-07 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355897#comment-16355897
 ] 

Brahma Reddy Battula commented on HDFS-12990:
-

should be committed to branch-3.0 which is going to be 3.0.2.?

> Change default NameNode RPC port back to 8020
> -
>
> Key: HDFS-12990
> URL: https://issues.apache.org/jira/browse/HDFS-12990
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Fix For: 3.0.1
>
> Attachments: HDFS-12990.01.patch
>
>
> In HDFS-9427 (HDFS should not default to ephemeral ports), we changed all 
> default ports to ephemeral ports, which is very appreciated by admin. As part 
> of that change, we also modified the NN RPC port from the famous 8020 to 
> 9820, to be closer to other ports changed there.
> With more integration going on, it appears that all the other ephemeral port 
> changes are fine, but the NN RPC port change is painful for downstream on 
> migrating to Hadoop 3. Some examples include:
> # Hive table locations pointing to hdfs://nn:port/dir
> # Downstream minicluster unit tests that assumed 8020
> # Oozie workflows / downstream scripts that used 8020
> This isn't a problem for HA URLs, since that does not include the port 
> number. But considering the downstream impact, instead of requiring all of 
> them change their stuff, it would be a way better experience to leave the NN 
> port unchanged. This will benefit Hadoop 3 adoption and ease unnecessary 
> upgrade burdens.
> It is of course incompatible, but giving 3.0.0 is just out, IMO it worths to 
> switch the port back.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12990) Change default NameNode RPC port back to 8020

2018-02-07 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355903#comment-16355903
 ] 

Anu Engineer commented on HDFS-12990:
-

[~brahmareddy] I will commit this to 3.0 now. Thanks for flagging it.

 

> Change default NameNode RPC port back to 8020
> -
>
> Key: HDFS-12990
> URL: https://issues.apache.org/jira/browse/HDFS-12990
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Fix For: 3.0.1
>
> Attachments: HDFS-12990.01.patch
>
>
> In HDFS-9427 (HDFS should not default to ephemeral ports), we changed all 
> default ports to ephemeral ports, which is very appreciated by admin. As part 
> of that change, we also modified the NN RPC port from the famous 8020 to 
> 9820, to be closer to other ports changed there.
> With more integration going on, it appears that all the other ephemeral port 
> changes are fine, but the NN RPC port change is painful for downstream on 
> migrating to Hadoop 3. Some examples include:
> # Hive table locations pointing to hdfs://nn:port/dir
> # Downstream minicluster unit tests that assumed 8020
> # Oozie workflows / downstream scripts that used 8020
> This isn't a problem for HA URLs, since that does not include the port 
> number. But considering the downstream impact, instead of requiring all of 
> them change their stuff, it would be a way better experience to leave the NN 
> port unchanged. This will benefit Hadoop 3 adoption and ease unnecessary 
> upgrade burdens.
> It is of course incompatible, but giving 3.0.0 is just out, IMO it worths to 
> switch the port back.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11701) NPE from Unresolved Host causes permanent DFSInputStream failures

2018-02-07 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355918#comment-16355918
 ] 

Jitendra Nath Pandey commented on HDFS-11701:
-

+1, I will commit shortly.

> NPE from Unresolved Host causes permanent DFSInputStream failures
> -
>
> Key: HDFS-11701
> URL: https://issues.apache.org/jira/browse/HDFS-11701
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
> Environment: AWS Centos linux running HBase CDH 5.9.0 and HDFS CDH 
> 5.9.0
>Reporter: James Moore
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-11701.001.patch, HDFS-11701.002.patch, 
> HDFS-11701.003.patch, HDFS-11701.004.patch
>
>
> We recently encountered the following NPE due to the DFSInputStream storing 
> old cached block locations from hosts which could no longer resolve.
> {quote}
> Caused by: java.lang.NullPointerException
> at org.apache.hadoop.hdfs.DFSClient.isLocalAddress(DFSClient.java:1122)
> at 
> org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.getPathInfo(DomainSocketFactory.java:148)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:474)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.seekToNewSource(DFSInputStream.java:1613)
> at 
> org.apache.hadoop.fs.FSDataInputStream.seekToNewSource(FSDataInputStream.java:127)
> ~HBase related stack frames trimmed~
> {quote}
> After investigating, the DFSInputStream appears to have been open for upwards 
> of 3-4 weeks and had cached block locations from decommissioned nodes that no 
> longer resolve in DNS and had been shutdown and removed from the cluster 2 
> weeks prior.  If the DFSInputStream had refreshed its block locations from 
> the name node, it would have received alternative block locations which would 
> not contain the decommissioned data nodes.  As the above NPE leaves the 
> non-resolving data node in the list of block locations the DFSInputStream 
> never refreshes the block locations and all attempts to open a BlockReader 
> for the given blocks will fail.
> In our case, we resolved the NPE by closing and re-opening every 
> DFSInputStream in the cluster to force a purge of the block locations cache.  
> Ideally, the DFSInputStream would re-fetch all block locations for a host 
> which can't be resolved in DNS or at least the blocks requested.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12933) Improve logging when DFSStripedOutputStream failed to write some blocks

2018-02-07 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355932#comment-16355932
 ] 

Brahma Reddy Battula commented on HDFS-12933:
-

Need to be in branch-3.0.1 also..?

> Improve logging when DFSStripedOutputStream failed to write some blocks
> ---
>
> Key: HDFS-12933
> URL: https://issues.apache.org/jira/browse/HDFS-12933
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Reporter: Xiao Chen
>Assignee: chencan
>Priority: Minor
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-12933.001.patch
>
>
> Currently if there are less DataNodes than the erasure coding policy's (# of 
> data blocks + # of parity blocks), the client sees this:
> {noformat}
> 09:18:24 17/12/14 09:18:24 WARN hdfs.DFSOutputStream: Cannot allocate parity 
> block(index=13, policy=RS-10-4-1024k). Not enough datanodes? Exclude nodes=[]
> 09:18:24 17/12/14 09:18:24 WARN hdfs.DFSOutputStream: Block group <1> has 1 
> corrupt blocks.
> {noformat}
> The 1st line is good. The 2nd line may be confusing to end users. We should 
> investigate the error and be more general / accurate. Maybe something like 
> 'failed to read x blocks'.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13081) Datanode#checkSecureConfig should check HTTPS and SASL encryption

2018-02-07 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355938#comment-16355938
 ] 

Jitendra Nath Pandey commented on HDFS-13081:
-

{quote}Delegation tokens send passwords in the clear over http.  Webhdfs is at 
high risk.
{quote}
That is a valid point. That explains why check for HTTPS was added in the first 
place. It should be documented in javadocs.

IIUC the required checks are following:

1) For RPC: It should be either a privileged port or must use SASL for mutual 
authentication.

2) For HTTP: It should be either a privileged port or must use HTTPS 

However a combination like privileged port for HTTP and SASL for RPC should 
also work. 

The advantage of SASL is that it allows qop negotiation and different clients 
can choose encryption depending on where they are connecting from and 
sensitivity of data.

[~daryn], what are your thoughts on having privileged port for HTTP with SASL 
for RPC?

> Datanode#checkSecureConfig should check HTTPS and SASL encryption
> -
>
> Key: HDFS-13081
> URL: https://issues.apache.org/jira/browse/HDFS-13081
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, security
>Affects Versions: 3.0.0
>Reporter: Xiaoyu Yao
>Assignee: Ajay Kumar
>Priority: Major
> Attachments: HDFS-13081.000.patch
>
>
> Datanode#checkSecureConfig currently check the following to determine if 
> secure datanode is enabled. 
>  # The server has bound to privileged ports for RPC and HTTP via 
> SecureDataNodeStarter.
>  # The configuration enables SASL on DataTransferProtocol and HTTPS (no plain 
> HTTP) for the HTTP server. The SASL handshake guarantees authentication of 
> the RPC server before a client transmits a secret, such as a block access 
> token. Similarly, SSL guarantees authentication of the
>  HTTP server before a client transmits a secret, such as a delegation token.
> For the 2nd case, HTTPS_ONLY means all the traffic between REST client/server 
> will be encrypted. However, the logic to check only if SASL property resolver 
> is configured does not mean server requires an encrypted RPC. 
> This ticket is open to further check and ensure datanode SASL property 
> resolver has a QoP that includes auth-conf(PRIVACY). Note that the SASL QoP 
> (Quality of Protection) negotiation may drop RPC protection level from 
> auth-conf(PRIVACY) to auth-int(integrity) or auth(authentication) only, which 
> should be fine by design.
>  
> cc: [~cnauroth] , [~daryn], [~jnpandey] for additional feedback.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-07 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355937#comment-16355937
 ] 

Brahma Reddy Battula commented on HDFS-11187:
-

Need to be in branch-3.0.1 also..?

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, HDFS-11187.001.patch, 
> HDFS-11187.002.patch, HDFS-11187.003.patch, HDFS-11187.004.patch, 
> HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12990) Change default NameNode RPC port back to 8020

2018-02-07 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355941#comment-16355941
 ] 

Anu Engineer commented on HDFS-12990:
-

Committed to branch-3.0 also.

> Change default NameNode RPC port back to 8020
> -
>
> Key: HDFS-12990
> URL: https://issues.apache.org/jira/browse/HDFS-12990
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Fix For: 3.0.1
>
> Attachments: HDFS-12990.01.patch
>
>
> In HDFS-9427 (HDFS should not default to ephemeral ports), we changed all 
> default ports to ephemeral ports, which is very appreciated by admin. As part 
> of that change, we also modified the NN RPC port from the famous 8020 to 
> 9820, to be closer to other ports changed there.
> With more integration going on, it appears that all the other ephemeral port 
> changes are fine, but the NN RPC port change is painful for downstream on 
> migrating to Hadoop 3. Some examples include:
> # Hive table locations pointing to hdfs://nn:port/dir
> # Downstream minicluster unit tests that assumed 8020
> # Oozie workflows / downstream scripts that used 8020
> This isn't a problem for HA URLs, since that does not include the port 
> number. But considering the downstream impact, instead of requiring all of 
> them change their stuff, it would be a way better experience to leave the NN 
> port unchanged. This will benefit Hadoop 3 adoption and ease unnecessary 
> upgrade burdens.
> It is of course incompatible, but giving 3.0.0 is just out, IMO it worths to 
> switch the port back.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11701) NPE from Unresolved Host causes permanent DFSInputStream failures

2018-02-07 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-11701:

   Resolution: Fixed
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

I have committed this to the trunk. Thanks [~ljain]!

> NPE from Unresolved Host causes permanent DFSInputStream failures
> -
>
> Key: HDFS-11701
> URL: https://issues.apache.org/jira/browse/HDFS-11701
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
> Environment: AWS Centos linux running HBase CDH 5.9.0 and HDFS CDH 
> 5.9.0
>Reporter: James Moore
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HDFS-11701.001.patch, HDFS-11701.002.patch, 
> HDFS-11701.003.patch, HDFS-11701.004.patch
>
>
> We recently encountered the following NPE due to the DFSInputStream storing 
> old cached block locations from hosts which could no longer resolve.
> {quote}
> Caused by: java.lang.NullPointerException
> at org.apache.hadoop.hdfs.DFSClient.isLocalAddress(DFSClient.java:1122)
> at 
> org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.getPathInfo(DomainSocketFactory.java:148)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:474)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.seekToNewSource(DFSInputStream.java:1613)
> at 
> org.apache.hadoop.fs.FSDataInputStream.seekToNewSource(FSDataInputStream.java:127)
> ~HBase related stack frames trimmed~
> {quote}
> After investigating, the DFSInputStream appears to have been open for upwards 
> of 3-4 weeks and had cached block locations from decommissioned nodes that no 
> longer resolve in DNS and had been shutdown and removed from the cluster 2 
> weeks prior.  If the DFSInputStream had refreshed its block locations from 
> the name node, it would have received alternative block locations which would 
> not contain the decommissioned data nodes.  As the above NPE leaves the 
> non-resolving data node in the list of block locations the DFSInputStream 
> never refreshes the block locations and all attempts to open a BlockReader 
> for the given blocks will fail.
> In our case, we resolved the NPE by closing and re-opening every 
> DFSInputStream in the cluster to force a purge of the block locations cache.  
> Ideally, the DFSInputStream would re-fetch all block locations for a host 
> which can't be resolved in DNS or at least the blocks requested.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355959#comment-16355959
 ] 

genericqa commented on HDFS-13110:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-10285 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
 7s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} HDFS-10285 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}138m 23s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}188m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.server.namenode.sps.TestStoragePolicySatisfierWithStripedFile 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13110 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909631/HDFS-13110-HDFS-10285-02.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 952a16ecd4a2 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | HDFS-10285 / 4a42e7a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22981/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Result

[jira] [Commented] (HDFS-11701) NPE from Unresolved Host causes permanent DFSInputStream failures

2018-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355965#comment-16355965
 ] 

Hudson commented on HDFS-11701:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13631 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13631/])
HDFS-11701. NPE from Unresolved Host causes permanent DFSInputStream (jitendra: 
rev b061215ecfebe476bf58f70788113d1af816f553)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ClientContext.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/client/impl/TestBlockReaderFactory.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/BlockReaderFactory.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSUtilClient.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/shortcircuit/DomainSocketFactory.java


> NPE from Unresolved Host causes permanent DFSInputStream failures
> -
>
> Key: HDFS-11701
> URL: https://issues.apache.org/jira/browse/HDFS-11701
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.0
> Environment: AWS Centos linux running HBase CDH 5.9.0 and HDFS CDH 
> 5.9.0
>Reporter: James Moore
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HDFS-11701.001.patch, HDFS-11701.002.patch, 
> HDFS-11701.003.patch, HDFS-11701.004.patch
>
>
> We recently encountered the following NPE due to the DFSInputStream storing 
> old cached block locations from hosts which could no longer resolve.
> {quote}
> Caused by: java.lang.NullPointerException
> at org.apache.hadoop.hdfs.DFSClient.isLocalAddress(DFSClient.java:1122)
> at 
> org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.getPathInfo(DomainSocketFactory.java:148)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:474)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.seekToNewSource(DFSInputStream.java:1613)
> at 
> org.apache.hadoop.fs.FSDataInputStream.seekToNewSource(FSDataInputStream.java:127)
> ~HBase related stack frames trimmed~
> {quote}
> After investigating, the DFSInputStream appears to have been open for upwards 
> of 3-4 weeks and had cached block locations from decommissioned nodes that no 
> longer resolve in DNS and had been shutdown and removed from the cluster 2 
> weeks prior.  If the DFSInputStream had refreshed its block locations from 
> the name node, it would have received alternative block locations which would 
> not contain the decommissioned data nodes.  As the above NPE leaves the 
> non-resolving data node in the list of block locations the DFSInputStream 
> never refreshes the block locations and all attempts to open a BlockReader 
> for the given blocks will fail.
> In our case, we resolved the NPE by closing and re-opening every 
> DFSInputStream in the cluster to force a purge of the block locations cache.  
> Ideally, the DFSInputStream would re-fetch all block locations for a host 
> which can't be resolved in DNS or at least the blocks requested.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13078) Ozone: Ratis read fail because stream is closed before the reply is received

2018-02-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355968#comment-16355968
 ] 

genericqa commented on HDFS-13078:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 7s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
8s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
9s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
40s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
12s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 56s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
6s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 26m  4s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
51s{color} | {color:red} The patch generated 640 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}114m  2s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.ozShell.TestOzoneShell |
|   | hadoop.ozone.web.client.TestOzoneClient |
|   | hadoop.ozone.TestOzoneConfigurationFields |
|   | hadoop.hdfs.TestDFSOutputStream |
|   | hadoop.ozone.scm.TestXceiverClientManager |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | hadoop.ozone.container.replication.TestContainerReplicationManager |
|   | hadoop.ozone.TestMiniOzoneCluster |
|   | hadoop.fs.viewfs.TestViewFileSystemWithXAttrs |
|   | hadoop.fs.TestHDFSFileContextMainOperations |
|   | hadoop.hdfs.TestDistributedFileSystem |
|   | hadoop.hdfs.TestDFSShell |
|   | hadoop.fs.viewfs.TestViewFileSystemWithAcls |
|   | hadoop.ozone.web.client.TestBuckets |
|   | hadoop.fs.viewfs.TestViewFsWithAcls |
|   | hadoop.ozone.scm.node.TestQueryNode |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFail

[jira] [Commented] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356011#comment-16356011
 ] 

Yongjun Zhang commented on HDFS-13115:
--

Thanks [~mi...@cloudera.com] for the new revs and [~szetszwo] for the review.

Hi Misha,

Sorry I did not review your latest rev in time. One minor suggestion, the ratio 
config is more intuitive to be a floating point, like other ratio kind of 
config parameters in DFSConfigKeys.java. I noticed that the default value in 
the code and in hdfs-default.xml is not the same. We need to make them same.

Hi [~szetszwo], are you ok with setting the default cache ratio to 1/400 
(0.0025)?  Given that the existing cache is not working well for some cases we 
examined, would you agree to push this forward?

Thanks.

 

 

 

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12935) Get ambiguous result for DFSAdmin command in HA mode when only one namenode is up

2018-02-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356019#comment-16356019
 ] 

genericqa commented on HDFS-12935:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
16s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
20s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 400 unchanged - 1 fixed = 400 total (was 401) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}111m 49s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  1m 
31s{color} | {color:red} The patch generated 260 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}138m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Unreaped Processes | hadoop-hdfs:27 |
| Failed junit tests | hadoop.hdfs.server.datanode.TestRefreshNamenodes |
|   | hadoop.hdfs.TestBlocksScheduledCounter |
|   | hadoop.hdfs.server.datanode.TestReadOnlySharedStorage |
|   | hadoop.hdfs.web.TestHttpsFileSystem |
|   | hadoop.hdfs.server.datanode.TestDataNodeTransferSocketSize |
|   | hadoop.hdfs.server.datanode.TestLargeBlockReport |
|   | hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup |
|   | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations |
|   | hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade |
|   | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
|   | hadoop.hdfs.web.TestWebHdfsFileSystemContract |
|   | hadoop.hdfs.server.datanode.TestTriggerBlockReport |
|   | hadoop.hdfs.web.TestWebHDFSAcl |
|   | hadoop.hdfs.server.datanode.TestStorageReport |
|   | hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport |
| Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 |
|   | org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery |
|   | org.apache.hadoop.hdfs.server.datanode.TestDataNodeFaultInjector |
|   | org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream |
|   | org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade |
|   | org.apache.hadoop.hdfs.TestFileAppendRestart |
|   | org.apache.hadoop.hdfs.security.TestDelegationToken |
|   | org.apache.hadoop.hdfs.web.TestWebHdfsWithRestCsrfPreventionFilter |
|   | org.apache.hadoop.hdfs.TestDFSMkdirs |
|   | org.apache.hadoop.hdfs.TestDFSOutputStream |
|   | org.apache.hadoop.hdfs.web.TestWebHDFS |
|   

[jira] [Commented] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356025#comment-16356025
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13115:


> ... and Tsz Wo Nicholas Sze for the review.

I like to clarify one more time that I neither have reviewed the patch nor the 
results.  I do have taken quick looks on the results but, honestly, I have not 
checked the details.  Thanks.

> ... would you agree to push this forward?

I won't be able to comment on this.  Sorry.

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-13115:
-
Comment: was deleted

(was: Thanks [~mi...@cloudera.com] for the new revs and [~szetszwo] for the 
review.

Hi Misha,

Sorry I did not review your latest rev in time. One minor suggestion, the ratio 
config is more intuitive to be a floating point, like other ratio kind of 
config parameters in DFSConfigKeys.java. I noticed that the default value in 
the code and in hdfs-default.xml is not the same. We need to make them same.

Hi [~szetszwo], are you ok with setting the default cache ratio to 1/400 
(0.0025)?  Given that the existing cache is not working well for some cases we 
examined, would you agree to push this forward?

Thanks.

 

 

 )

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-07 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356026#comment-16356026
 ] 

Yongjun Zhang commented on HDFS-12051:
--

Thanks [~mi...@cloudera.com] for the new revs and [~szetszwo] for the review.

Hi Misha,

Sorry I did not review your latest rev in time. One minor suggestion, the ratio 
config is more intuitive to be a floating point, like other ratio kind of 
config parameters in DFSConfigKeys.java. I noticed that the default value in 
the code and in hdfs-default.xml is not the same. We need to make them same.

Hi [~szetszwo], are you ok with setting the default cache ratio to 1/400 
(0.0025)?  Given that the existing cache is not working well for some cases we 
examined, would you agree to push this forward?

Thanks.

 

> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -
>
> Key: HDFS-12051
> URL: https://issues.apache.org/jira/browse/HDFS-12051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, 
> HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, 
> HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, 
> HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>  Ovhd Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528  25760871 byte[]
> 
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 
> of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, 
> 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...)
> ... and 45902395 more arrays, of which 13158084 are unique
>  <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name 
> <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode 
> <--  {j.u.ArrayList} <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs 
> <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 
> elements) ... <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java 
> Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER
>   409,830K (0.8%), 13482787 dup arrays (13260241 unique)
> 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...)
> ... and 13479257 more arrays, of which 13260231 are unique
>  <-- org.apache.hadoop.hdfs

[jira] [Commented] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356028#comment-16356028
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13115:


And this seems not the correct JIRA?

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-07 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356031#comment-16356031
 ] 

Tsz Wo Nicholas Sze commented on HDFS-12051:


> ... and Tsz Wo Nicholas Sze for the review.

I like to clarify one more time that I neither have reviewed the patch nor the 
results.  I do have taken quick looks on the results but, honestly, I have not 
checked the details.  Thanks.

> ... would you agree to push this forward?

I won't be able to comment on this.  Sorry.

> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -
>
> Key: HDFS-12051
> URL: https://issues.apache.org/jira/browse/HDFS-12051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, 
> HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, 
> HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, 
> HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>  Ovhd Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528  25760871 byte[]
> 
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 
> of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, 
> 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...)
> ... and 45902395 more arrays, of which 13158084 are unique
>  <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name 
> <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode 
> <--  {j.u.ArrayList} <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs 
> <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 
> elements) ... <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java 
> Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER
>   409,830K (0.8%), 13482787 dup arrays (13260241 unique)
> 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...)
> ... and 13479257 more arrays, of which 13260231 are unique
>  <-- org.apache.hadoop.hdfs.server.namenode.INodeFile.name <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- 
> org.apache.hadoop.util.LightWeightGSet$LinkedElement[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockma

[jira] [Issue Comment Deleted] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-13115:
---
Comment: was deleted

(was: And this seems not the correct JIRA?)

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-13115:
---
Comment: was deleted

(was: > ... and Tsz Wo Nicholas Sze for the review.

I like to clarify one more time that I neither have reviewed the patch nor the 
results.  I do have taken quick looks on the results but, honestly, I have not 
checked the details.  Thanks.

> ... would you agree to push this forward?

I won't be able to comment on this.  Sorry.)

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356036#comment-16356036
 ] 

Yongjun Zhang commented on HDFS-13115:
--

Hi [~szetszwo],

Sorry I posted my previous comment in wrong Jira here. Would you please move 
your comment above to HDFS-12051?

Thanks.

 

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-13115:
-
Comment: was deleted

(was: Hi [~szetszwo],

Sorry I posted my previous comment in wrong Jira here. Would you please move 
your comment above to HDFS-12051?

Thanks.

 )

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13116) Ozone: Refactor Pipeline to have transport and container specific information

2018-02-07 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356082#comment-16356082
 ] 

Anu Engineer commented on HDFS-13116:
-

[~msingh] Thanks for working on this patch. It looks very good. I have some 
very minor comments.
* Conduit --> Something like PipelineInfo
* interface to Abstract --I am wondering if this makes future changes less 
flexible. For example, if we have erasure coded pipelines, I am not sure if 
this abstract class implementation would work. But for the current case of 
stand-alone and Ratis, I do see it works well.

* {{PipelineManager#getPipeline}}
{code}
  if (conduit == null) {
  LOG.error("Get conduit call failed. We are not able to find free nodes" +
  " or operational conduit.");
}
return new Pipeline(containerName, conduit);
 {code}
 Shouldn't we throw after the error, or do you want to return a new pipeline 
with conduit equal to null?

{{PipelineManager#findOpenConduit}}
Should this function be protected via a synchronized keyword or a lock? 

Here is what I am thinking -- please correct me if I am wrong.

-- This is the point of check 
{code}
  if (activeConduits.size() == 0) {
{code}

--  This is the point of use.
{code}
private int getNextIndex() {
return conduitsIndex.incrementAndGet() % activeConduits.size();
}
{code}
What happens if the activeConduits size becomes zero? ispossible?ible ?







> Ozone: Refactor Pipeline to have transport and container specific information
> -
>
> Key: HDFS-13116
> URL: https://issues.apache.org/jira/browse/HDFS-13116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13116-HDFS-7240.001.patch, 
> HDFS-13116-HDFS-7240.002.patch, HDFS-13116-HDFS-7240.003.patch, 
> HDFS-13116-HDFS-7240.004.patch, HDFS-13116-HDFS-7240.005.patch, 
> HDFS-13116-HDFS-7240.006.patch
>
>
> Currently pipeline has information about both the container as well Transport 
> layer. This results in cases where a new pipeline (i.e. transport) 
> information is allocated for each container creation.
> This code can be refactored so that the Transport information is separated 
> from the container, then the {{Transport}} can be shared between multiple 
> pipelines/containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13116) Ozone: Refactor Pipeline to have transport and container specific information

2018-02-07 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356082#comment-16356082
 ] 

Anu Engineer edited comment on HDFS-13116 at 2/7/18 9:04 PM:
-

[~msingh] Thanks for working on this patch. It looks very good. I have some 
very minor comments.
 * Conduit --> Something like PipelineInfo
 * interface to Abstract --I am wondering if this makes future changes less 
flexible. For example, if we have erasure coded pipelines, I am not sure if 
this abstract class implementation would work. But for the current case of 
stand-alone and Ratis, I do see it works well.

 * {{PipelineManager#getPipeline}}
{code:java}
  if (conduit == null) {
  LOG.error("Get conduit call failed. We are not able to find free nodes" +
  " or operational conduit.");
}
return new Pipeline(containerName, conduit);
 {code}
Shouldn't we throw after the error, or do you want to return a new pipeline 
with conduit equal to null?

{{PipelineManager#findOpenConduit}}
Should this function be protected via a synchronized keyword or a lock?

Here is what I am thinking – please correct me if I am wrong.

– This is the point of check
{code:java}
  if (activeConduits.size() == 0) {
{code}
– This is the point of use.
{code:java}
private int getNextIndex() {
return conduitsIndex.incrementAndGet() % activeConduits.size();
}
{code}
What happens if the activeConduits size becomes zero? is that possible?


was (Author: anu):
[~msingh] Thanks for working on this patch. It looks very good. I have some 
very minor comments.
* Conduit --> Something like PipelineInfo
* interface to Abstract --I am wondering if this makes future changes less 
flexible. For example, if we have erasure coded pipelines, I am not sure if 
this abstract class implementation would work. But for the current case of 
stand-alone and Ratis, I do see it works well.

* {{PipelineManager#getPipeline}}
{code}
  if (conduit == null) {
  LOG.error("Get conduit call failed. We are not able to find free nodes" +
  " or operational conduit.");
}
return new Pipeline(containerName, conduit);
 {code}
 Shouldn't we throw after the error, or do you want to return a new pipeline 
with conduit equal to null?

{{PipelineManager#findOpenConduit}}
Should this function be protected via a synchronized keyword or a lock? 

Here is what I am thinking -- please correct me if I am wrong.

-- This is the point of check 
{code}
  if (activeConduits.size() == 0) {
{code}

--  This is the point of use.
{code}
private int getNextIndex() {
return conduitsIndex.incrementAndGet() % activeConduits.size();
}
{code}
What happens if the activeConduits size becomes zero? ispossible?ible ?







> Ozone: Refactor Pipeline to have transport and container specific information
> -
>
> Key: HDFS-13116
> URL: https://issues.apache.org/jira/browse/HDFS-13116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13116-HDFS-7240.001.patch, 
> HDFS-13116-HDFS-7240.002.patch, HDFS-13116-HDFS-7240.003.patch, 
> HDFS-13116-HDFS-7240.004.patch, HDFS-13116-HDFS-7240.005.patch, 
> HDFS-13116-HDFS-7240.006.patch
>
>
> Currently pipeline has information about both the container as well Transport 
> layer. This results in cases where a new pipeline (i.e. transport) 
> information is allocated for each container creation.
> This code can be refactored so that the Transport information is separated 
> from the container, then the {{Transport}} can be shared between multiple 
> pipelines/containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-07 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HDFS-12051:
--
Status: In Progress  (was: Patch Available)

> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -
>
> Key: HDFS-12051
> URL: https://issues.apache.org/jira/browse/HDFS-12051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, 
> HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, 
> HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, 
> HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch, 
> HDFS-12051.11.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>  Ovhd Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528  25760871 byte[]
> 
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 
> of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, 
> 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...)
> ... and 45902395 more arrays, of which 13158084 are unique
>  <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name 
> <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode 
> <--  {j.u.ArrayList} <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs 
> <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 
> elements) ... <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java 
> Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER
>   409,830K (0.8%), 13482787 dup arrays (13260241 unique)
> 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...)
> ... and 13479257 more arrays, of which 13260231 are unique
>  <-- org.apache.hadoop.hdfs.server.namenode.INodeFile.name <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- 
> org.apache.hadoop.util.LightWeightGSet$LinkedElement[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hd

[jira] [Updated] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-07 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HDFS-12051:
--
Attachment: HDFS-12051.11.patch

> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -
>
> Key: HDFS-12051
> URL: https://issues.apache.org/jira/browse/HDFS-12051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, 
> HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, 
> HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, 
> HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch, 
> HDFS-12051.11.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>  Ovhd Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528  25760871 byte[]
> 
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 
> of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, 
> 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...)
> ... and 45902395 more arrays, of which 13158084 are unique
>  <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name 
> <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode 
> <--  {j.u.ArrayList} <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs 
> <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 
> elements) ... <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java 
> Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER
>   409,830K (0.8%), 13482787 dup arrays (13260241 unique)
> 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...)
> ... and 13479257 more arrays, of which 13260231 are unique
>  <-- org.apache.hadoop.hdfs.server.namenode.INodeFile.name <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- 
> org.apache.hadoop.util.LightWeightGSet$LinkedElement[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.bl

[jira] [Updated] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-07 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HDFS-12051:
--
Status: Patch Available  (was: In Progress)

Addressed the latest commment by [~yzhangal] regarding the cache size as a heap 
size ratio format (switched from int to float).

> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -
>
> Key: HDFS-12051
> URL: https://issues.apache.org/jira/browse/HDFS-12051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, 
> HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, 
> HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, 
> HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch, 
> HDFS-12051.11.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>  Ovhd Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528  25760871 byte[]
> 
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 
> of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, 
> 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...)
> ... and 45902395 more arrays, of which 13158084 are unique
>  <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name 
> <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode 
> <--  {j.u.ArrayList} <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs 
> <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 
> elements) ... <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java 
> Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER
>   409,830K (0.8%), 13482787 dup arrays (13260241 unique)
> 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...)
> ... and 13479257 more arrays, of which 13260231 are unique
>  <-- org.apache.hadoop.hdfs.server.namenode.INodeFile.name <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- 
> org.apache.hadoop.util.LightWeightGSet$LinkedElement[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThre

[jira] [Commented] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356159#comment-16356159
 ] 

Hudson commented on HDFS-13115:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13632 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13632/])
HDFS-13115. In getNumUnderConstructionBlocks(), ignore the inodeIds for 
(yzhang: rev f491f717e9ee6b75ad5cfca48da9c6297e94a8f7)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java


> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13116) Ozone: Refactor Pipeline to have transport and container specific information

2018-02-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356160#comment-16356160
 ] 

genericqa commented on HDFS-13116:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
34s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
49s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
16s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
11s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
13s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
7s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 46s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
48s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}119m 43s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}194m 26s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.TestOzoneConfigurationFields |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.TestErasureCodingPolicies |
|   | hadoop.hdfs.TestFileChecksum |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.TestReadStripedFileWithDNFailure |
|   | hadoop.ozone.scm.container.TestContainerStateManager |
|   | hadoop.ozone.TestMiniOzoneCluster |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.ozone.TestStorageContainerManager |
|   | hadoop.hdfs.TestHDFSFileSystemContract |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Ser

[jira] [Commented] (HDFS-13099) RBF: Use the ZooKeeper as the default State Store

2018-02-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356167#comment-16356167
 ] 

Íñigo Goiri commented on HDFS-13099:


[^HDFS-13099.005.patch] seems to run the unit tests properly.

I checked and it was successful at running the ones that use the State Store. 
Most of them rely on the curator mini ZK cluster so I think this is safe.

> RBF: Use the ZooKeeper as the default State Store
> -
>
> Key: HDFS-13099
> URL: https://issues.apache.org/jira/browse/HDFS-13099
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
>  Labels: incompatible, incompatibleChange
> Attachments: HDFS-13099.001.patch, HDFS-13099.002.patch, 
> HDFS-13099.003.patch, HDFS-13099.004.patch, HDFS-13099.005.patch
>
>
> Currently the State Store Driver relevant settings only written in its 
> implement classes.
> {noformat}
> public class StateStoreZooKeeperImpl extends StateStoreSerializableImpl {
> ...
>   /** Configuration keys. */
>   public static final String FEDERATION_STORE_ZK_DRIVER_PREFIX =
>   DFSConfigKeys.FEDERATION_STORE_PREFIX + "driver.zk.";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH =
>   FEDERATION_STORE_ZK_DRIVER_PREFIX + "parent-path";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH_DEFAULT =
>   "/hdfs-federation";
> ..
> {noformat}
> Actually, they should be moved into class {{DFSConfigKeys}} and documented in 
> file {{hdfs-default.xml}}. This will help more users know these settings and 
> know how to use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13099) RBF: Use the ZooKeeper as the default State Store

2018-02-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356167#comment-16356167
 ] 

Íñigo Goiri edited comment on HDFS-13099 at 2/7/18 10:24 PM:
-

[^HDFS-13099.005.patch] seems to run the unit tests properly.

I checked and it was successful at running the ones that use the State Store. 
Most of them rely on the curator mini ZK cluster so I think this is safe.

We could do the TestMetricsBase change in the RouterConfigBuilder itself when 
we set {{stateStore()}}.


was (Author: elgoiri):
[^HDFS-13099.005.patch] seems to run the unit tests properly.

I checked and it was successful at running the ones that use the State Store. 
Most of them rely on the curator mini ZK cluster so I think this is safe.

> RBF: Use the ZooKeeper as the default State Store
> -
>
> Key: HDFS-13099
> URL: https://issues.apache.org/jira/browse/HDFS-13099
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
>  Labels: incompatible, incompatibleChange
> Attachments: HDFS-13099.001.patch, HDFS-13099.002.patch, 
> HDFS-13099.003.patch, HDFS-13099.004.patch, HDFS-13099.005.patch
>
>
> Currently the State Store Driver relevant settings only written in its 
> implement classes.
> {noformat}
> public class StateStoreZooKeeperImpl extends StateStoreSerializableImpl {
> ...
>   /** Configuration keys. */
>   public static final String FEDERATION_STORE_ZK_DRIVER_PREFIX =
>   DFSConfigKeys.FEDERATION_STORE_PREFIX + "driver.zk.";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH =
>   FEDERATION_STORE_ZK_DRIVER_PREFIX + "parent-path";
>   public static final String FEDERATION_STORE_ZK_PARENT_PATH_DEFAULT =
>   "/hdfs-federation";
> ..
> {noformat}
> Actually, they should be moved into class {{DFSConfigKeys}} and documented in 
> file {{hdfs-default.xml}}. This will help more users know these settings and 
> know how to use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13119) RBF: manage unavailable clusters

2018-02-07 Thread JIRA
Íñigo Goiri created HDFS-13119:
--

 Summary: RBF: manage unavailable clusters
 Key: HDFS-13119
 URL: https://issues.apache.org/jira/browse/HDFS-13119
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Íñigo Goiri


When a federated cluster has one of the subcluster down, operations that run in 
every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356211#comment-16356211
 ] 

Íñigo Goiri commented on HDFS-13119:


We had this happening the other day when we added a subcluster for testing and 
the Namenodes in this subcluster were down for a few days. The Routers ended up 
with thousands of threads trying to do RPC connections to the Namenodes that 
were down. One example was {{renewLease()}}, this operation is executed in all 
the subclusters and we were had connections stuck for more than 3 minutes 
because the default retry policy was to try 10 times with a timeout of 20 
seconds.

We should do a couple things:
* Better control of the number of RPC clients
* No need to try so many times if we "know" the subcluster is down

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Priority: Major
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-07 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-13119:
---
Summary: RBF: Manage unavailable clusters  (was: RBF: manage unavailable 
clusters)

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Priority: Major
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-13115:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.1
   2.10.0
   3.1.0
   Status: Resolved  (was: Patch Available)

Thanks [~billyean] and [~jojochuang] for the review. I committed to trunk, 
3.0.1, branch-2.

 

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 3.0.1
>
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356249#comment-16356249
 ] 

genericqa commented on HDFS-12051:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 50s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 3 new + 1235 unchanged - 18 fixed = 1238 total (was 1253) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 55s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m  
2s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 30s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}140m 28s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Increment of volatile field 
org.apache.hadoop.hdfs.server.namenode.NameCache.size in 
org.apache.hadoop.hdfs.server.namenode.NameCache.put(byte[])  At 
NameCache.java:in org.apache.hadoop.hdfs.server.namenode.NameCache.put(byte[])  
At NameCache.java:[line 119] |
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-12051 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909669/HDFS-12051.11.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux c995c1528d5b 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personali

[jira] [Created] (HDFS-13120) Snapshot diff could be corrupted after concat

2018-02-07 Thread Xiaoyu Yao (JIRA)
Xiaoyu Yao created HDFS-13120:
-

 Summary: Snapshot diff could be corrupted after concat
 Key: HDFS-13120
 URL: https://issues.apache.org/jira/browse/HDFS-13120
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, snapshots
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao


The snapshot diff can be corrupted after concat files. This could lead to 
Assertion upon DeleteSnapshot and getSnapshotDiff operations later. 

For example, we have seen customers hit stack trace similar to the one below 
but during loading edit entry of DeleteSnapshotOp. After the investigation, we 
found this is a regression caused by HDFS-3689 where the snapshot diff is not 
fully cleaned up after concat. 

I will post the unit test to repro this and fix for it shortly.

{code}
org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Element 
already exists: element=0.txt, CREATED=[0.txt, 1.txt, 2.txt]
at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:196)
at org.apache.hadoop.hdfs.util.Diff.create(Diff.java:216)
at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:463)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:205)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:162)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.AbstractINodeDiffList.deleteSnapshotDiff(AbstractINodeDiffList.java:100)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:728)
at 
org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:830)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:237)
at 
org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:292)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:321)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.deleteSnapshot(FSDirSnapshotOp.java:249)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteSnapshot(FSNamesystem.java:6566)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.deleteSnapshot(NameNodeRpcServer.java:1823)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.deleteSnapshot(ClientNamenodeProtocolServerSideTranslatorPB.java:1200)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1007)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:873)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:819)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2679)

{code} 





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13120) Snapshot diff could be corrupted after concat

2018-02-07 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356279#comment-16356279
 ] 

Xiaoyu Yao commented on HDFS-13120:
---

*Error stack with SnapshotDiffReport*
{code}
org.apache.hadoop.ipc.RemoteException(java.lang.IllegalArgumentException): 
Illegal Capacity: -2
at java.util.ArrayList.(ArrayList.java:156)
at org.apache.hadoop.hdfs.util.Diff.apply2Previous(Diff.java:363)
at org.apache.hadoop.hdfs.util.Diff.apply2Current(Diff.java:413)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:232)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:240)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.computeDiffRecursively(DirectorySnapshottableFeature.java:466)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.computeDiff(DirectorySnapshottableFeature.java:332)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.diff(SnapshotManager.java:492)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.getSnapshotDiffReportListing(FSDirSnapshotOp.java:183)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getSnapshotDiffReportListing(FSNamesystem.java:6532)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getSnapshotDiffReportListing(NameNodeRpcServer.java:1896)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getSnapshotDiffReportListing(ClientNamenodeProtocolServerSideTranslatorPB.java:1281)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1007)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:873)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:819)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2679)

{code}

> Snapshot diff could be corrupted after concat
> -
>
> Key: HDFS-13120
> URL: https://issues.apache.org/jira/browse/HDFS-13120
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, snapshots
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>
> The snapshot diff can be corrupted after concat files. This could lead to 
> Assertion upon DeleteSnapshot and getSnapshotDiff operations later. 
> For example, we have seen customers hit stack trace similar to the one below 
> but during loading edit entry of DeleteSnapshotOp. After the investigation, 
> we found this is a regression caused by HDFS-3689 where the snapshot diff is 
> not fully cleaned up after concat. 
> I will post the unit test to repro this and fix for it shortly.
> {code}
> org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Element 
> already exists: element=0.txt, CREATED=[0.txt, 1.txt, 2.txt]
>   at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:196)
>   at org.apache.hadoop.hdfs.util.Diff.create(Diff.java:216)
>   at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:463)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.AbstractINodeDiffList.deleteSnapshotDiff(AbstractINodeDiffList.java:100)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:728)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:830)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:237)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:292)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:321)
>   

[jira] [Commented] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356298#comment-16356298
 ] 

Xiao Chen commented on HDFS-13115:
--

cherry-picked to branch-3.0. thanks all

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 3.0.1
>
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12615) Router-based HDFS federation phase 2

2018-02-07 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356309#comment-16356309
 ] 

Wei Yan commented on HDFS-12615:


[~elgoiri] For the #1 rebalancer, do we just create another separate umbrea 
Jira, so that we can try to close this phase2 Jira? Or we still put the umbrea 
one under this Jira?

For the rebalancer itself, I have some code working in my local, by reusing 
lots of code from your prototype. I can put a quick doc summarizing the idea, 
and we can start to merge the code back to trunk.

> Router-based HDFS federation phase 2
> 
>
> Key: HDFS-12615
> URL: https://issues.apache.org/jira/browse/HDFS-12615
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
>  Labels: RBF
>
> This umbrella JIRA tracks set of improvements over the Router-based HDFS 
> federation (HDFS-10467).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13120) Snapshot diff could be corrupted after concat

2018-02-07 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-13120:
--
Status: Patch Available  (was: Open)

> Snapshot diff could be corrupted after concat
> -
>
> Key: HDFS-13120
> URL: https://issues.apache.org/jira/browse/HDFS-13120
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, snapshots
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13120.001.patch
>
>
> The snapshot diff can be corrupted after concat files. This could lead to 
> Assertion upon DeleteSnapshot and getSnapshotDiff operations later. 
> For example, we have seen customers hit stack trace similar to the one below 
> but during loading edit entry of DeleteSnapshotOp. After the investigation, 
> we found this is a regression caused by HDFS-3689 where the snapshot diff is 
> not fully cleaned up after concat. 
> I will post the unit test to repro this and fix for it shortly.
> {code}
> org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Element 
> already exists: element=0.txt, CREATED=[0.txt, 1.txt, 2.txt]
>   at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:196)
>   at org.apache.hadoop.hdfs.util.Diff.create(Diff.java:216)
>   at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:463)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.AbstractINodeDiffList.deleteSnapshotDiff(AbstractINodeDiffList.java:100)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:728)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:830)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:237)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:292)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:321)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.deleteSnapshot(FSDirSnapshotOp.java:249)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteSnapshot(FSNamesystem.java:6566)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.deleteSnapshot(NameNodeRpcServer.java:1823)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.deleteSnapshot(ClientNamenodeProtocolServerSideTranslatorPB.java:1200)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1007)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:873)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:819)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2679)
> {code} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13120) Snapshot diff could be corrupted after concat

2018-02-07 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-13120:
--
Attachment: HDFS-13120.001.patch

> Snapshot diff could be corrupted after concat
> -
>
> Key: HDFS-13120
> URL: https://issues.apache.org/jira/browse/HDFS-13120
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, snapshots
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13120.001.patch
>
>
> The snapshot diff can be corrupted after concat files. This could lead to 
> Assertion upon DeleteSnapshot and getSnapshotDiff operations later. 
> For example, we have seen customers hit stack trace similar to the one below 
> but during loading edit entry of DeleteSnapshotOp. After the investigation, 
> we found this is a regression caused by HDFS-3689 where the snapshot diff is 
> not fully cleaned up after concat. 
> I will post the unit test to repro this and fix for it shortly.
> {code}
> org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Element 
> already exists: element=0.txt, CREATED=[0.txt, 1.txt, 2.txt]
>   at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:196)
>   at org.apache.hadoop.hdfs.util.Diff.create(Diff.java:216)
>   at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:463)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.AbstractINodeDiffList.deleteSnapshotDiff(AbstractINodeDiffList.java:100)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:728)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:830)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:237)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:292)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:321)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.deleteSnapshot(FSDirSnapshotOp.java:249)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteSnapshot(FSNamesystem.java:6566)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.deleteSnapshot(NameNodeRpcServer.java:1823)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.deleteSnapshot(ClientNamenodeProtocolServerSideTranslatorPB.java:1200)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1007)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:873)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:819)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2679)
> {code} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12615) Router-based HDFS federation phase 2

2018-02-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-12615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356319#comment-16356319
 ] 

Íñigo Goiri commented on HDFS-12615:


[~ywskycn] I think it makes sense to create a separate umbrella and add design 
docs etc, there.

> Router-based HDFS federation phase 2
> 
>
> Key: HDFS-12615
> URL: https://issues.apache.org/jira/browse/HDFS-12615
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
>  Labels: RBF
>
> This umbrella JIRA tracks set of improvements over the Router-based HDFS 
> federation (HDFS-10467).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12512) RBF: Add WebHDFS

2018-02-07 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356335#comment-16356335
 ] 

Wei Yan commented on HDFS-12512:


I'll try one more later night, maybe can help avoid other parallel jobs. For 
the split and other ideas, I think we can discuss more details in the HDFS bug 
bash session.

> RBF: Add WebHDFS
> 
>
> Key: HDFS-12512
> URL: https://issues.apache.org/jira/browse/HDFS-12512
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs
>Reporter: Íñigo Goiri
>Assignee: Wei Yan
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-12512.000.patch, HDFS-12512.001.patch, 
> HDFS-12512.002.patch, HDFS-12512.003.patch, HDFS-12512.004.patch, 
> HDFS-12512.005.patch, HDFS-12512.006.patch
>
>
> The Router currently does not support WebHDFS. It needs to implement 
> something similar to {{NamenodeWebHdfsMethods}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13120) Snapshot diff could be corrupted after concat

2018-02-07 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356351#comment-16356351
 ] 

Xiaoyu Yao commented on HDFS-13120:
---

Thanks [~arpitagarwal],[~jnp] and [~szetszwo] for the offline discussion on the 
fix. 

> Snapshot diff could be corrupted after concat
> -
>
> Key: HDFS-13120
> URL: https://issues.apache.org/jira/browse/HDFS-13120
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, snapshots
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13120.001.patch
>
>
> The snapshot diff can be corrupted after concat files. This could lead to 
> Assertion upon DeleteSnapshot and getSnapshotDiff operations later. 
> For example, we have seen customers hit stack trace similar to the one below 
> but during loading edit entry of DeleteSnapshotOp. After the investigation, 
> we found this is a regression caused by HDFS-3689 where the snapshot diff is 
> not fully cleaned up after concat. 
> I will post the unit test to repro this and fix for it shortly.
> {code}
> org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Element 
> already exists: element=0.txt, CREATED=[0.txt, 1.txt, 2.txt]
>   at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:196)
>   at org.apache.hadoop.hdfs.util.Diff.create(Diff.java:216)
>   at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:463)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.AbstractINodeDiffList.deleteSnapshotDiff(AbstractINodeDiffList.java:100)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:728)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:830)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:237)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:292)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:321)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.deleteSnapshot(FSDirSnapshotOp.java:249)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteSnapshot(FSNamesystem.java:6566)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.deleteSnapshot(NameNodeRpcServer.java:1823)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.deleteSnapshot(ClientNamenodeProtocolServerSideTranslatorPB.java:1200)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1007)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:873)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:819)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2679)
> {code} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13121) NPE when request file descriptors when SC read

2018-02-07 Thread Gang Xie (JIRA)
Gang Xie created HDFS-13121:
---

 Summary: NPE when request file descriptors when SC read
 Key: HDFS-13121
 URL: https://issues.apache.org/jira/browse/HDFS-13121
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Gang Xie


Recently, we hit an issue that the DFSClient throws NPE. The case is that, the 
app process exceeds the limit of the max open file. In the case, the libhadoop 
never throw and exception but return null to the request of fds. But 
requestFileDescriptors use the returned fds directly without any check and then 
NPE. 

 

We need add a sanity check here of null pointer.

 

private ShortCircuitReplicaInfo requestFileDescriptors(DomainPeer peer,
 Slot slot) throws IOException {
 ShortCircuitCache cache = clientContext.getShortCircuitCache();
 final DataOutputStream out =
 new DataOutputStream(new BufferedOutputStream(peer.getOutputStream()));
 SlotId slotId = slot == null ? null : slot.getSlotId();
 new Sender(out).requestShortCircuitFds(block, token, slotId, 1,
 failureInjector.getSupportsReceiptVerification());
 DataInputStream in = new DataInputStream(peer.getInputStream());
 BlockOpResponseProto resp = BlockOpResponseProto.parseFrom(
 PBHelperClient.vintPrefixed(in));
 DomainSocket sock = peer.getDomainSocket();
 failureInjector.injectRequestFileDescriptorsFailure();
 switch (resp.getStatus()) {
 case SUCCESS:
 byte buf[] = new byte[1];
 FileInputStream[] fis = new FileInputStream[2];
 {color:#d04437}sock.recvFileInputStreams(fis, buf, 0, buf.length);{color}
 ShortCircuitReplica replica = null;
 try {
 ExtendedBlockId key =
 new ExtendedBlockId(block.getBlockId(), block.getBlockPoolId());
 if (buf[0] == USE_RECEIPT_VERIFICATION.getNumber()) {
 LOG.trace("Sending receipt verification byte for slot {}", slot);
 sock.getOutputStream().write(0);
 }
 {color:#d04437}replica = new ShortCircuitReplica(key, fis[0], fis[1], 
cache,{color}
{color:#d04437} Time.monotonicNow(), slot);{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13122) Tailing edits should not update quota counts on ObserverNode

2018-02-07 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-13122:
---
Summary: Tailing edits should not update quota counts on ObserverNode  
(was: FSImage should not update quota counts on ObserverNode)

> Tailing edits should not update quota counts on ObserverNode
> 
>
> Key: HDFS-13122
> URL: https://issues.apache.org/jira/browse/HDFS-13122
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> Currently in {{FSImage#loadEdits()}}, after applying a set of edits, we call
> {code}
> updateCountForQuota(target.getBlockManager().getStoragePolicySuite(), 
> target.dir.rootDir);
> {code}
> to update the quota counts for the entire namespace, which can be very 
> expensive. This makes sense if we are about to become the ANN, since we need 
> valid quotas, but not on an ObserverNode which does not need to enforce 
> quotas.
> This is related to increasing the frequency with which the SbNN can tail 
> edits from the ANN to decrease the lag time for transactions to appear on the 
> Observer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >