from:"Zesheng Wu \(JIRA\)"


[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136853#comment-14136853
 ] 

Zesheng Wu commented on HDFS-7068:
--

bq. Indeed, right now the storage policies in HDFS-6584 only select replica 
locations based on storage types. I think it should be possible to extended the 
mechanism to cover placement policies in general. Basically, the 
BlockStoragePolicy class can include more hints/requirements for the 
chooseTargets method.
I still think storage types and replica locations are two orthogonal things, 
extend the block placement policy will be more suitable. Anyway, you suggestion 
is very valuable, and I'm not so familiar with HDFS-6584, I will spend some 
time to check it.

bq. The use case you gave is very interesting (erasure coding for a subtree). 
Do you have more examples what customized placement policies you need?
The erasure coding example is what we currently encountered in our environment. 
There's no other obvious case for us so far, maybe other folks can give more 
examples.

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7068) Support multiple block placement policies


[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136662#comment-14136662
 ] 

Zesheng Wu commented on HDFS-7068:
--

[~zhz], Thanks for reply.
To my understanding,  storage policies and block placement policies are 
different things, the former is used to determine which storage type is to be 
used, the later is used to determine where a replica is to be placed.

> Support multiple block placement policies
> -
>
> Key: HDFS-7068
> URL: https://issues.apache.org/jira/browse/HDFS-7068
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> According to the code, the current implement of HDFS only supports one 
> specific type of block placement policy, which is BlockPlacementPolicyDefault 
> by default.
> The default policy is enough for most of the circumstances, but under some 
> special circumstances, it works not so well.
> For example, on a shared cluster, we want to erasure encode all the files 
> under some specified directories. So the files under these directories need 
> to use a new placement policy.
> But at the same time, other files still use the default placement policy. 
> Here we need to support multiple placement policies for the HDFS.
> One plain thought is that, the default placement policy is still configured 
> as the default. On the other hand, HDFS can let user specify customized 
> placement policy through the extended attributes(xattr). When the HDFS choose 
> the replica targets, it firstly check the customized placement policy, if not 
> specified, it fallbacks to the default one. 
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7044) Support retention policy based on access time and modify time, use XAttr to store policy


[ 
https://issues.apache.org/jira/browse/HDFS-7044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135548#comment-14135548
 ] 

Zesheng Wu commented on HDFS-7044:
--

bq. The same arguments that said that HDFS-6382 should be outside the NN apply 
here as well.
Yes, can't agree any more. [~aw]

> Support retention policy based on access time and modify time, use XAttr to 
> store policy
> 
>
> Key: HDFS-7044
> URL: https://issues.apache.org/jira/browse/HDFS-7044
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Reporter: zhaoyunjiong
>Assignee: zhaoyunjiong
> Attachments: Retention policy design.pdf
>
>
> The basic idea is set retention policy on directory based on access time and 
> modify time and use XAttr to store policy.
> Files under directory which have retention policy will be delete if meet the 
> retention rule.
> There are three rule:
> # access time 
> #* If (accessTime + retentionTimeForAccess < now), the file will be delete
> # modify time
> #* If (modifyTime + retentionTimeForModify < now), the file will be delete
> # access time and modify time
> #* If (accessTime + retentionTimeForAccess < now && modifyTime + 
> retentionTimeForModify < now ), the file will be delete



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7044) Support retention policy based on access time and modify time, use XAttr to store policy


[ 
https://issues.apache.org/jira/browse/HDFS-7044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135425#comment-14135425
 ] 

Zesheng Wu commented on HDFS-7044:
--

bq. HDFS-6382 is standalone daemon outside NameNode, HDFS-7044 will be inside 
NameNode, I believe HDFS-7044 will be more simple and efficient.

Yes, HDFS-6382 implements a standalone daemon, but it's not hard to start the 
TtlManager inside NameNode, which is something like the trash emptier.  What's 
more important for HDFS-6382 is to supply the possibility for implementing a 
general mechanism which can support various kinds of policies over the 
namespace.

> Support retention policy based on access time and modify time, use XAttr to 
> store policy
> 
>
> Key: HDFS-7044
> URL: https://issues.apache.org/jira/browse/HDFS-7044
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Reporter: zhaoyunjiong
>Assignee: zhaoyunjiong
> Attachments: Retention policy design.pdf
>
>
> The basic idea is set retention policy on directory based on access time and 
> modify time and use XAttr to store policy.
> Files under directory which have retention policy will be delete if meet the 
> retention rule.
> There are three rule:
> # access time 
> #* If (accessTime + retentionTimeForAccess < now), the file will be delete
> # modify time
> #* If (modifyTime + retentionTimeForModify < now), the file will be delete
> # access time and modify time
> #* If (accessTime + retentionTimeForAccess < now && modifyTime + 
> retentionTimeForModify < now ), the file will be delete



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7068) Support multiple block placement policies

Zesheng Wu created HDFS-7068:


 Summary: Support multiple block placement policies
 Key: HDFS-7068
 URL: https://issues.apache.org/jira/browse/HDFS-7068
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.1
Reporter: Zesheng Wu
Assignee: Zesheng Wu


According to the code, the current implement of HDFS only supports one specific 
type of block placement policy, which is BlockPlacementPolicyDefault by default.
The default policy is enough for most of the circumstances, but under some 
special circumstances, it works not so well.

For example, on a shared cluster, we want to erasure encode all the files under 
some specified directories. So the files under these directories need to use a 
new placement policy.
But at the same time, other files still use the default placement policy. Here 
we need to support multiple placement policies for the HDFS.

One plain thought is that, the default placement policy is still configured as 
the default. On the other hand, HDFS can let user specify customized placement 
policy through the extended attributes(xattr). When the HDFS choose the replica 
targets, it firstly check the customized placement policy, if not specified, it 
fallbacks to the default one. 

Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6898) DN must reserve space for a full block when an RBW block is created

2014-08-28 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114847#comment-14114847
 ] 

Zesheng Wu commented on HDFS-6898:
--

Oh, sorry, I missed that detail in the patch. 
One more question, in the append() method :
{code}
ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten(
replicaInfo.getBlockId(), replicaInfo.getNumBytes(), newGS,
v, newBlkFile.getParentFile(), Thread.currentThread(), 
estimateBlockLen);
{code}
Here reserves {{estimateBlockLen}} bytes

{code}
 v.reserveSpaceForRbw(estimateBlockLen - replicaInfo.getNumBytes());
{code}
Here reserves {{estimateBlockLen - replicaInfo.getNumBytes()}} bytes

What's the difference between these two?

> DN must reserve space for a full block when an RBW block is created
> ---
>
> Key: HDFS-6898
> URL: https://issues.apache.org/jira/browse/HDFS-6898
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Gopal V
>Assignee: Arpit Agarwal
> Attachments: HDFS-6898.01.patch, HDFS-6898.03.patch, 
> HDFS-6898.04.patch, HDFS-6898.05.patch
>
>
> DN will successfully create two RBW blocks on the same volume even if the 
> free space is sufficient for just one full block.
> One or both block writers may subsequently get a DiskOutOfSpace exception. 
> This can be avoided by allocating space up front.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6898) DN must reserve space for a full block when an RBW block is created

2014-08-28 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114824#comment-14114824
 ] 

Zesheng Wu commented on HDFS-6898:
--

[~arpitagarwal], what about append data to existing blocks, should we also need 
to reserve space?

> DN must reserve space for a full block when an RBW block is created
> ---
>
> Key: HDFS-6898
> URL: https://issues.apache.org/jira/browse/HDFS-6898
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Gopal V
>Assignee: Arpit Agarwal
> Attachments: HDFS-6898.01.patch, HDFS-6898.03.patch, 
> HDFS-6898.04.patch, HDFS-6898.05.patch
>
>
> DN will successfully create two RBW blocks on the same volume even if the 
> free space is sufficient for just one full block.
> One or both block writers may subsequently get a DiskOutOfSpace exception. 
> This can be avoided by allocating space up front.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6827) Both NameNodes stuck in STANDBY state due to HealthMonitor not aware of the target's status changing sometimes

2014-08-26 Thread Zesheng Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6827:
-

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Duplicate of HADOOP-10251.

> Both NameNodes stuck in STANDBY state due to HealthMonitor not aware of the 
> target's status changing sometimes
> --
>
> Key: HDFS-6827
> URL: https://issues.apache.org/jira/browse/HDFS-6827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Critical
> Attachments: HDFS-6827.1.patch
>
>
> In our production cluster, we encounter a scenario like this: ANN crashed due 
> to write journal timeout, and was restarted by the watchdog automatically, 
> but after restarting both of the NNs are standby.
> Following is the logs of the scenario:
> # NN1 is down due to write journal timeout:
> {color:red}2014-08-03,23:02:02,219{color} INFO 
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG
> # ZKFC1 detected "connection reset by peer"
> {color:red}2014-08-03,23:02:02,560{color} ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:xx@xx.HADOOP (auth:KERBEROS) cause:java.io.IOException: 
> {color:red}Connection reset by peer{color}
> # NN1 wat restarted successfully by the watchdog:
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Web-server up at: xx:13201
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> {color:red}2014-08-03,23:02:07,884{color} INFO org.apache.hadoop.ipc.Server: 
> IPC Server listener on 13200: starting
> 2014-08-03,23:02:08,742 INFO org.apache.hadoop.ipc.Server: RPC server clean 
> thread started!
> 2014-08-03,23:02:08,743 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Registered DFSClientInformation MBean
> 2014-08-03,23:02:08,744 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> NameNode up at: xx/xx:13200
> 2014-08-03,23:02:08,744 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services 
> required for standby state
> # ZKFC1 retried the connection and considered NN1 was healthy
> {color:red}2014-08-03,23:02:08,292{color} INFO org.apache.hadoop.ipc.Client: 
> Retrying connect to server: xx/xx:13200. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 
> SECONDS)
> # ZKFC1 still considered NN1 as a healthy Active NN, and didn't trigger the 
> failover, as a result, both NNs were standby.
> The root cause of this bug is that NN is restarted too quickly and ZKFC 
> health monitor doesn't realize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6827) Both NameNodes stuck in STANDBY state due to HealthMonitor not aware of the target's status changing sometimes

2014-08-26 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110645#comment-14110645
 ] 

Zesheng Wu commented on HDFS-6827:
--

[~vinayrpet], I verified your patch of HADOOP-10251 on my cluster, it works as 
expected. Thanks.
I will resolve this issue as 'duplicated'.

> Both NameNodes stuck in STANDBY state due to HealthMonitor not aware of the 
> target's status changing sometimes
> --
>
> Key: HDFS-6827
> URL: https://issues.apache.org/jira/browse/HDFS-6827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Critical
> Attachments: HDFS-6827.1.patch
>
>
> In our production cluster, we encounter a scenario like this: ANN crashed due 
> to write journal timeout, and was restarted by the watchdog automatically, 
> but after restarting both of the NNs are standby.
> Following is the logs of the scenario:
> # NN1 is down due to write journal timeout:
> {color:red}2014-08-03,23:02:02,219{color} INFO 
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG
> # ZKFC1 detected "connection reset by peer"
> {color:red}2014-08-03,23:02:02,560{color} ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:xx@xx.HADOOP (auth:KERBEROS) cause:java.io.IOException: 
> {color:red}Connection reset by peer{color}
> # NN1 wat restarted successfully by the watchdog:
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Web-server up at: xx:13201
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> {color:red}2014-08-03,23:02:07,884{color} INFO org.apache.hadoop.ipc.Server: 
> IPC Server listener on 13200: starting
> 2014-08-03,23:02:08,742 INFO org.apache.hadoop.ipc.Server: RPC server clean 
> thread started!
> 2014-08-03,23:02:08,743 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Registered DFSClientInformation MBean
> 2014-08-03,23:02:08,744 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> NameNode up at: xx/xx:13200
> 2014-08-03,23:02:08,744 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services 
> required for standby state
> # ZKFC1 retried the connection and considered NN1 was healthy
> {color:red}2014-08-03,23:02:08,292{color} INFO org.apache.hadoop.ipc.Client: 
> Retrying connect to server: xx/xx:13200. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 
> SECONDS)
> # ZKFC1 still considered NN1 as a healthy Active NN, and didn't trigger the 
> failover, as a result, both NNs were standby.
> The root cause of this bug is that NN is restarted too quickly and ZKFC 
> health monitor doesn't realize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6827) Both NameNodes stuck in STANDBY state due to HealthMonitor not aware of the target's status changing sometimes

2014-08-21 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106364#comment-14106364
 ] 

Zesheng Wu commented on HDFS-6827:
--

Thanks [~vinayrpet], I will try the scenario in the latest trunk code soon.

> Both NameNodes stuck in STANDBY state due to HealthMonitor not aware of the 
> target's status changing sometimes
> --
>
> Key: HDFS-6827
> URL: https://issues.apache.org/jira/browse/HDFS-6827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Critical
> Attachments: HDFS-6827.1.patch
>
>
> In our production cluster, we encounter a scenario like this: ANN crashed due 
> to write journal timeout, and was restarted by the watchdog automatically, 
> but after restarting both of the NNs are standby.
> Following is the logs of the scenario:
> # NN1 is down due to write journal timeout:
> {color:red}2014-08-03,23:02:02,219{color} INFO 
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG
> # ZKFC1 detected "connection reset by peer"
> {color:red}2014-08-03,23:02:02,560{color} ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:xx@xx.HADOOP (auth:KERBEROS) cause:java.io.IOException: 
> {color:red}Connection reset by peer{color}
> # NN1 wat restarted successfully by the watchdog:
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Web-server up at: xx:13201
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> {color:red}2014-08-03,23:02:07,884{color} INFO org.apache.hadoop.ipc.Server: 
> IPC Server listener on 13200: starting
> 2014-08-03,23:02:08,742 INFO org.apache.hadoop.ipc.Server: RPC server clean 
> thread started!
> 2014-08-03,23:02:08,743 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Registered DFSClientInformation MBean
> 2014-08-03,23:02:08,744 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> NameNode up at: xx/xx:13200
> 2014-08-03,23:02:08,744 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services 
> required for standby state
> # ZKFC1 retried the connection and considered NN1 was healthy
> {color:red}2014-08-03,23:02:08,292{color} INFO org.apache.hadoop.ipc.Client: 
> Retrying connect to server: xx/xx:13200. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 
> SECONDS)
> # ZKFC1 still considered NN1 as a healthy Active NN, and didn't trigger the 
> failover, as a result, both NNs were standby.
> The root cause of this bug is that NN is restarted too quickly and ZKFC 
> health monitor doesn't realize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6827) Both NameNodes stuck in STANDBY state due to HealthMonitor not aware of the target's status changing sometimes

2014-08-21 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105261#comment-14105261
 ] 

Zesheng Wu commented on HDFS-6827:
--

Ping [~vinayrpet]..

> Both NameNodes stuck in STANDBY state due to HealthMonitor not aware of the 
> target's status changing sometimes
> --
>
> Key: HDFS-6827
> URL: https://issues.apache.org/jira/browse/HDFS-6827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Critical
> Attachments: HDFS-6827.1.patch
>
>
> In our production cluster, we encounter a scenario like this: ANN crashed due 
> to write journal timeout, and was restarted by the watchdog automatically, 
> but after restarting both of the NNs are standby.
> Following is the logs of the scenario:
> # NN1 is down due to write journal timeout:
> {color:red}2014-08-03,23:02:02,219{color} INFO 
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG
> # ZKFC1 detected "connection reset by peer"
> {color:red}2014-08-03,23:02:02,560{color} ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:xx@xx.HADOOP (auth:KERBEROS) cause:java.io.IOException: 
> {color:red}Connection reset by peer{color}
> # NN1 wat restarted successfully by the watchdog:
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Web-server up at: xx:13201
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> {color:red}2014-08-03,23:02:07,884{color} INFO org.apache.hadoop.ipc.Server: 
> IPC Server listener on 13200: starting
> 2014-08-03,23:02:08,742 INFO org.apache.hadoop.ipc.Server: RPC server clean 
> thread started!
> 2014-08-03,23:02:08,743 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Registered DFSClientInformation MBean
> 2014-08-03,23:02:08,744 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> NameNode up at: xx/xx:13200
> 2014-08-03,23:02:08,744 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services 
> required for standby state
> # ZKFC1 retried the connection and considered NN1 was healthy
> {color:red}2014-08-03,23:02:08,292{color} INFO org.apache.hadoop.ipc.Client: 
> Retrying connect to server: xx/xx:13200. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 
> SECONDS)
> # ZKFC1 still considered NN1 as a healthy Active NN, and didn't trigger the 
> failover, as a result, both NNs were standby.
> The root cause of this bug is that NN is restarted too quickly and ZKFC 
> health monitor doesn't realize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6827) Both NameNodes stuck in STANDBY state due to HealthMonitor not aware of the target's status changing sometimes

2014-08-20 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103557#comment-14103557
 ] 

Zesheng Wu commented on HDFS-6827:
--

I gone through your patch of HADOOP-10251.
{code}
private synchronized void setLastServiceStatus(HAServiceStatus status) {
this.lastServiceState = status;
for (ServiceStateCallback cb : serviceStateCallbacks) {
  cb.reportServiceStatus(lastServiceState);
}
  }
{code}
The above method is only called by the {{HealthMonitor}}, so the state check 
will only be performed during the health check. How does your patch handle the 
following two scenarios(NN1 is Active, NN2 is standby):
# NN1 is restarted, but ZKFC1 isn't aware of it, ZKFC1 thinks NN1 is healthy, 
last state is Active, current state is Standby
# Do a graceful failover from the command line tool, ZKFC1 thinks NN1 is 
healthy, last state is Active, current state is Standby

> Both NameNodes stuck in STANDBY state due to HealthMonitor not aware of the 
> target's status changing sometimes
> --
>
> Key: HDFS-6827
> URL: https://issues.apache.org/jira/browse/HDFS-6827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Critical
> Attachments: HDFS-6827.1.patch
>
>
> In our production cluster, we encounter a scenario like this: ANN crashed due 
> to write journal timeout, and was restarted by the watchdog automatically, 
> but after restarting both of the NNs are standby.
> Following is the logs of the scenario:
> # NN1 is down due to write journal timeout:
> {color:red}2014-08-03,23:02:02,219{color} INFO 
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG
> # ZKFC1 detected "connection reset by peer"
> {color:red}2014-08-03,23:02:02,560{color} ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:xx@xx.HADOOP (auth:KERBEROS) cause:java.io.IOException: 
> {color:red}Connection reset by peer{color}
> # NN1 wat restarted successfully by the watchdog:
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Web-server up at: xx:13201
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> {color:red}2014-08-03,23:02:07,884{color} INFO org.apache.hadoop.ipc.Server: 
> IPC Server listener on 13200: starting
> 2014-08-03,23:02:08,742 INFO org.apache.hadoop.ipc.Server: RPC server clean 
> thread started!
> 2014-08-03,23:02:08,743 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Registered DFSClientInformation MBean
> 2014-08-03,23:02:08,744 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> NameNode up at: xx/xx:13200
> 2014-08-03,23:02:08,744 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services 
> required for standby state
> # ZKFC1 retried the connection and considered NN1 was healthy
> {color:red}2014-08-03,23:02:08,292{color} INFO org.apache.hadoop.ipc.Client: 
> Retrying connect to server: xx/xx:13200. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 
> SECONDS)
> # ZKFC1 still considered NN1 as a healthy Active NN, and didn't trigger the 
> failover, as a result, both NNs were standby.
> The root cause of this bug is that NN is restarted too quickly and ZKFC 
> health monitor doesn't realize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6827) Both NameNodes stuck in STANDBY state due to HealthMonitor not aware of the target's status changing sometimes


[ 
https://issues.apache.org/jira/browse/HDFS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103505#comment-14103505
 ] 

Zesheng Wu commented on HDFS-6827:
--

[~vinayrpet], Can you verify according to the reproduce method as I described 
[here | 
https://issues.apache.org/jira/browse/HDFS-6827?focusedCommentId=14101725&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14101725]?
 
Thanks.

> Both NameNodes stuck in STANDBY state due to HealthMonitor not aware of the 
> target's status changing sometimes
> --
>
> Key: HDFS-6827
> URL: https://issues.apache.org/jira/browse/HDFS-6827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Critical
> Attachments: HDFS-6827.1.patch
>
>
> In our production cluster, we encounter a scenario like this: ANN crashed due 
> to write journal timeout, and was restarted by the watchdog automatically, 
> but after restarting both of the NNs are standby.
> Following is the logs of the scenario:
> # NN1 is down due to write journal timeout:
> {color:red}2014-08-03,23:02:02,219{color} INFO 
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG
> # ZKFC1 detected "connection reset by peer"
> {color:red}2014-08-03,23:02:02,560{color} ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:xx@xx.HADOOP (auth:KERBEROS) cause:java.io.IOException: 
> {color:red}Connection reset by peer{color}
> # NN1 wat restarted successfully by the watchdog:
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Web-server up at: xx:13201
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> {color:red}2014-08-03,23:02:07,884{color} INFO org.apache.hadoop.ipc.Server: 
> IPC Server listener on 13200: starting
> 2014-08-03,23:02:08,742 INFO org.apache.hadoop.ipc.Server: RPC server clean 
> thread started!
> 2014-08-03,23:02:08,743 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Registered DFSClientInformation MBean
> 2014-08-03,23:02:08,744 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> NameNode up at: xx/xx:13200
> 2014-08-03,23:02:08,744 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services 
> required for standby state
> # ZKFC1 retried the connection and considered NN1 was healthy
> {color:red}2014-08-03,23:02:08,292{color} INFO org.apache.hadoop.ipc.Client: 
> Retrying connect to server: xx/xx:13200. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 
> SECONDS)
> # ZKFC1 still considered NN1 as a healthy Active NN, and didn't trigger the 
> failover, as a result, both NNs were standby.
> The root cause of this bug is that NN is restarted too quickly and ZKFC 
> health monitor doesn't realize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6827) Both NameNodes stuck in STANDBY state due to HealthMonitor not aware of the target's status changing sometimes


[ 
https://issues.apache.org/jira/browse/HDFS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103500#comment-14103500
 ] 

Zesheng Wu commented on HDFS-6827:
--

I've looked into HADOOP-10251,  it's quite different from this issue.
The root cause of this issue is that ANN's ZKFC isn't aware that ANN is 
retarted and doesn't trigger failover.

> Both NameNodes stuck in STANDBY state due to HealthMonitor not aware of the 
> target's status changing sometimes
> --
>
> Key: HDFS-6827
> URL: https://issues.apache.org/jira/browse/HDFS-6827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Critical
> Attachments: HDFS-6827.1.patch
>
>
> In our production cluster, we encounter a scenario like this: ANN crashed due 
> to write journal timeout, and was restarted by the watchdog automatically, 
> but after restarting both of the NNs are standby.
> Following is the logs of the scenario:
> # NN1 is down due to write journal timeout:
> {color:red}2014-08-03,23:02:02,219{color} INFO 
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG
> # ZKFC1 detected "connection reset by peer"
> {color:red}2014-08-03,23:02:02,560{color} ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:xx@xx.HADOOP (auth:KERBEROS) cause:java.io.IOException: 
> {color:red}Connection reset by peer{color}
> # NN1 wat restarted successfully by the watchdog:
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Web-server up at: xx:13201
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> {color:red}2014-08-03,23:02:07,884{color} INFO org.apache.hadoop.ipc.Server: 
> IPC Server listener on 13200: starting
> 2014-08-03,23:02:08,742 INFO org.apache.hadoop.ipc.Server: RPC server clean 
> thread started!
> 2014-08-03,23:02:08,743 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Registered DFSClientInformation MBean
> 2014-08-03,23:02:08,744 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> NameNode up at: xx/xx:13200
> 2014-08-03,23:02:08,744 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services 
> required for standby state
> # ZKFC1 retried the connection and considered NN1 was healthy
> {color:red}2014-08-03,23:02:08,292{color} INFO org.apache.hadoop.ipc.Client: 
> Retrying connect to server: xx/xx:13200. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 
> SECONDS)
> # ZKFC1 still considered NN1 as a healthy Active NN, and didn't trigger the 
> failover, as a result, both NNs were standby.
> The root cause of this bug is that NN is restarted too quickly and ZKFC 
> health monitor doesn't realize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6827) Both NameNodes stuck in STANDBY state due to HealthMonitor not aware of the target's status changing sometimes


[ 
https://issues.apache.org/jira/browse/HDFS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103278#comment-14103278
 ] 

Zesheng Wu commented on HDFS-6827:
--

bq. As I described in the issue description, our NN came up inside about 6 
seconds. 1 second for Client#handleConnectionFailure() sleep, the other 5 
seconds for some unknown reasons, maybe GC or network problems, we haven't 
found direct evidences.

Sorry, this description is not very accurate.
Our NN came up inside about 6 seconds. And the ZKFC retried connection exactly 
after NN starting successfully. There are about 6 seconds between ZKFC detected 
'Connection reset by peer' and reconnected NN successfully. 1 second for 
{{Client#handleConnectionFailure()}} sleep is definitely, the other 5 seconds 
for some unknown reasons, maybe GC or network problems, we haven't found direct 
evidences.


> Both NameNodes stuck in STANDBY state due to HealthMonitor not aware of the 
> target's status changing sometimes
> --
>
> Key: HDFS-6827
> URL: https://issues.apache.org/jira/browse/HDFS-6827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Critical
> Attachments: HDFS-6827.1.patch
>
>
> In our production cluster, we encounter a scenario like this: ANN crashed due 
> to write journal timeout, and was restarted by the watchdog automatically, 
> but after restarting both of the NNs are standby.
> Following is the logs of the scenario:
> # NN1 is down due to write journal timeout:
> {color:red}2014-08-03,23:02:02,219{color} INFO 
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG
> # ZKFC1 detected "connection reset by peer"
> {color:red}2014-08-03,23:02:02,560{color} ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:xx@xx.HADOOP (auth:KERBEROS) cause:java.io.IOException: 
> {color:red}Connection reset by peer{color}
> # NN1 wat restarted successfully by the watchdog:
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Web-server up at: xx:13201
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> {color:red}2014-08-03,23:02:07,884{color} INFO org.apache.hadoop.ipc.Server: 
> IPC Server listener on 13200: starting
> 2014-08-03,23:02:08,742 INFO org.apache.hadoop.ipc.Server: RPC server clean 
> thread started!
> 2014-08-03,23:02:08,743 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Registered DFSClientInformation MBean
> 2014-08-03,23:02:08,744 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> NameNode up at: xx/xx:13200
> 2014-08-03,23:02:08,744 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services 
> required for standby state
> # ZKFC1 retried the connection and considered NN1 was healthy
> {color:red}2014-08-03,23:02:08,292{color} INFO org.apache.hadoop.ipc.Client: 
> Retrying connect to server: xx/xx:13200. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 
> SECONDS)
> # ZKFC1 still considered NN1 as a healthy Active NN, and didn't trigger the 
> failover, as a result, both NNs were standby.
> The root cause of this bug is that NN is restarted too quickly and ZKFC 
> health monitor doesn't realize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6827) Both NameNodes stuck in STANDBY state due to HealthMonitor not aware of the target's status changing sometimes


 [ 
https://issues.apache.org/jira/browse/HDFS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6827:
-

Summary: Both NameNodes stuck in STANDBY state due to HealthMonitor not 
aware of the target's status changing sometimes  (was: Both NameNodes could be 
in STANDBY State due to HealthMonitor not aware of the target's status changing 
sometimes)

> Both NameNodes stuck in STANDBY state due to HealthMonitor not aware of the 
> target's status changing sometimes
> --
>
> Key: HDFS-6827
> URL: https://issues.apache.org/jira/browse/HDFS-6827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Critical
> Attachments: HDFS-6827.1.patch
>
>
> In our production cluster, we encounter a scenario like this: ANN crashed due 
> to write journal timeout, and was restarted by the watchdog automatically, 
> but after restarting both of the NNs are standby.
> Following is the logs of the scenario:
> # NN1 is down due to write journal timeout:
> {color:red}2014-08-03,23:02:02,219{color} INFO 
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG
> # ZKFC1 detected "connection reset by peer"
> {color:red}2014-08-03,23:02:02,560{color} ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:xx@xx.HADOOP (auth:KERBEROS) cause:java.io.IOException: 
> {color:red}Connection reset by peer{color}
> # NN1 wat restarted successfully by the watchdog:
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Web-server up at: xx:13201
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> {color:red}2014-08-03,23:02:07,884{color} INFO org.apache.hadoop.ipc.Server: 
> IPC Server listener on 13200: starting
> 2014-08-03,23:02:08,742 INFO org.apache.hadoop.ipc.Server: RPC server clean 
> thread started!
> 2014-08-03,23:02:08,743 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Registered DFSClientInformation MBean
> 2014-08-03,23:02:08,744 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> NameNode up at: xx/xx:13200
> 2014-08-03,23:02:08,744 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services 
> required for standby state
> # ZKFC1 retried the connection and considered NN1 was healthy
> {color:red}2014-08-03,23:02:08,292{color} INFO org.apache.hadoop.ipc.Client: 
> Retrying connect to server: xx/xx:13200. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 
> SECONDS)
> # ZKFC1 still considered NN1 as a healthy Active NN, and didn't trigger the 
> failover, as a result, both NNs were standby.
> The root cause of this bug is that NN is restarted too quickly and ZKFC 
> health monitor doesn't realize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6827) Both NameNodes could be in STANDBY State due to HealthMonitor not aware of the target's status changing sometimes


[ 
https://issues.apache.org/jira/browse/HDFS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103261#comment-14103261
 ] 

Zesheng Wu commented on HDFS-6827:
--

Thanks [~stack]:)

bq. Yes. Is there anything more definitive than a check for a particular state 
transition? (Sorry, don't know this area well).
If we want to fix the bug inside ZKFC, there's no other definitive indicator 
according to my current knowledge of ZKFC. 

bq. This seems less prone to misinterpretation.
Yes, this is more straightforward and less prone to misinterpretation. But 
change the {{MonitorHealthResponseProto}} proto may introduce an incompatible 
changing, if folks think this is acceptable, perhaps we can use this method.

bq. Your NN came up inside a second?
As I described in the issue description, our NN came up inside about 6 seconds. 
1 second for {{Client#handleConnectionFailure()}} sleep, the other 5 seconds 
for some unknown reasons, maybe GC or network problems, we haven't found direct 
evidences. 

bq. A hacky workaround in meantime would have the NN start sleep first for a 
second?
Yes,  we can let NN sleep sometime before startup. Indeed we use this method to 
quick fix the bug in our production cluster temporarily. But for a long term 
and general solution, we should fix this in the ZKFC side. One more thing, ZKFC 
is a general automatic HA failover framework, it is used in HDFS, but not only 
for HDFS, it may be used in other system who needs automatic HA failover. From 
this perspective, we should fix this inside ZKFC.


> Both NameNodes could be in STANDBY State due to HealthMonitor not aware of 
> the target's status changing sometimes
> -
>
> Key: HDFS-6827
> URL: https://issues.apache.org/jira/browse/HDFS-6827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Critical
> Attachments: HDFS-6827.1.patch
>
>
> In our production cluster, we encounter a scenario like this: ANN crashed due 
> to write journal timeout, and was restarted by the watchdog automatically, 
> but after restarting both of the NNs are standby.
> Following is the logs of the scenario:
> # NN1 is down due to write journal timeout:
> {color:red}2014-08-03,23:02:02,219{color} INFO 
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG
> # ZKFC1 detected "connection reset by peer"
> {color:red}2014-08-03,23:02:02,560{color} ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:xx@xx.HADOOP (auth:KERBEROS) cause:java.io.IOException: 
> {color:red}Connection reset by peer{color}
> # NN1 wat restarted successfully by the watchdog:
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Web-server up at: xx:13201
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> {color:red}2014-08-03,23:02:07,884{color} INFO org.apache.hadoop.ipc.Server: 
> IPC Server listener on 13200: starting
> 2014-08-03,23:02:08,742 INFO org.apache.hadoop.ipc.Server: RPC server clean 
> thread started!
> 2014-08-03,23:02:08,743 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Registered DFSClientInformation MBean
> 2014-08-03,23:02:08,744 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> NameNode up at: xx/xx:13200
> 2014-08-03,23:02:08,744 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services 
> required for standby state
> # ZKFC1 retried the connection and considered NN1 was healthy
> {color:red}2014-08-03,23:02:08,292{color} INFO org.apache.hadoop.ipc.Client: 
> Retrying connect to server: xx/xx:13200. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 
> SECONDS)
> # ZKFC1 still considered NN1 as a healthy Active NN, and didn't trigger the 
> failover, as a result, both NNs were standby.
> The root cause of this bug is that NN is restarted too quickly and ZKFC 
> health monitor doesn't realize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6827) Both NameNodes could be in STANDBY State due to HealthMonitor not aware of the target's status changing sometimes

2014-08-18 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101781#comment-14101781
 ] 

Zesheng Wu commented on HDFS-6827:
--

bq. In the below test, is it possible the state gets moved to standby before 
the service healthy check runs? Would the test get stuck in this case?
It's not possible, the state is only changed during health checking or graceful 
failover, here's no graceful failover, so it's only changed by the health 
monitor. If the health monitor doesn't run, the state won't be moved.

> Both NameNodes could be in STANDBY State due to HealthMonitor not aware of 
> the target's status changing sometimes
> -
>
> Key: HDFS-6827
> URL: https://issues.apache.org/jira/browse/HDFS-6827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Critical
> Attachments: HDFS-6827.1.patch
>
>
> In our production cluster, we encounter a scenario like this: ANN crashed due 
> to write journal timeout, and was restarted by the watchdog automatically, 
> but after restarting both of the NNs are standby.
> Following is the logs of the scenario:
> # NN1 is down due to write journal timeout:
> {color:red}2014-08-03,23:02:02,219{color} INFO 
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG
> # ZKFC1 detected "connection reset by peer"
> {color:red}2014-08-03,23:02:02,560{color} ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:xx@xx.HADOOP (auth:KERBEROS) cause:java.io.IOException: 
> {color:red}Connection reset by peer{color}
> # NN1 wat restarted successfully by the watchdog:
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Web-server up at: xx:13201
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> {color:red}2014-08-03,23:02:07,884{color} INFO org.apache.hadoop.ipc.Server: 
> IPC Server listener on 13200: starting
> 2014-08-03,23:02:08,742 INFO org.apache.hadoop.ipc.Server: RPC server clean 
> thread started!
> 2014-08-03,23:02:08,743 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Registered DFSClientInformation MBean
> 2014-08-03,23:02:08,744 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> NameNode up at: xx/xx:13200
> 2014-08-03,23:02:08,744 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services 
> required for standby state
> # ZKFC1 retried the connection and considered NN1 was healthy
> {color:red}2014-08-03,23:02:08,292{color} INFO org.apache.hadoop.ipc.Client: 
> Retrying connect to server: xx/xx:13200. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 
> SECONDS)
> # ZKFC1 still considered NN1 as a healthy Active NN, and didn't trigger the 
> failover, as a result, both NNs were standby.
> The root cause of this bug is that NN is restarted too quickly and ZKFC 
> health monitor doesn't realize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6827) Both NameNodes could be in STANDBY State due to HealthMonitor not aware of the target's status changing sometimes

2014-08-18 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101775#comment-14101775
 ] 

Zesheng Wu commented on HDFS-6827:
--

bq. BTW: The healthCheckLock is used to distinguish the graceful failover and 
the above scenario. 
bq. I don't know this code well. How is the above done?
bq. The code in checkServiceStatus seems 'fragile' looking for a explicity 
transition. Is there a more explicit check that can be done to learn if 
'service is restarted'?
The solution for this issue is to let the ZKFC learn 'service is restarted'. 

One straightforward way is to add a field in the {{MonitorHealthResponseProto}} 
to identify that the service is restarted, for example the pid of the NN 
process, or a generated UUID will satisfy our requirement. Another way is to 
let the ZKFC learn 'service is restarted' by comparing the service's current 
state and last state. We choose the later one, in this way we can fix the 
problem inside ZKFC, don't influence other services. 

As we know that ZKFC supports gracefully failover from the command line tool, 
and during graceful failover, the ZKFC may encounter a scenario like this: the 
last state of the service Active, the current is Standby, and the service is 
healthy. This scenario is just the same as the buggy scenario described above, 
we must distinguish these two scenarios. So we  add the {{healthCheckLock}} to 
'fragile' the health checking when doing graceful failover.

Hope I expressed myself clearly:)

> Both NameNodes could be in STANDBY State due to HealthMonitor not aware of 
> the target's status changing sometimes
> -
>
> Key: HDFS-6827
> URL: https://issues.apache.org/jira/browse/HDFS-6827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Critical
> Attachments: HDFS-6827.1.patch
>
>
> In our production cluster, we encounter a scenario like this: ANN crashed due 
> to write journal timeout, and was restarted by the watchdog automatically, 
> but after restarting both of the NNs are standby.
> Following is the logs of the scenario:
> # NN1 is down due to write journal timeout:
> {color:red}2014-08-03,23:02:02,219{color} INFO 
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG
> # ZKFC1 detected "connection reset by peer"
> {color:red}2014-08-03,23:02:02,560{color} ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:xx@xx.HADOOP (auth:KERBEROS) cause:java.io.IOException: 
> {color:red}Connection reset by peer{color}
> # NN1 wat restarted successfully by the watchdog:
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Web-server up at: xx:13201
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> {color:red}2014-08-03,23:02:07,884{color} INFO org.apache.hadoop.ipc.Server: 
> IPC Server listener on 13200: starting
> 2014-08-03,23:02:08,742 INFO org.apache.hadoop.ipc.Server: RPC server clean 
> thread started!
> 2014-08-03,23:02:08,743 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Registered DFSClientInformation MBean
> 2014-08-03,23:02:08,744 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> NameNode up at: xx/xx:13200
> 2014-08-03,23:02:08,744 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services 
> required for standby state
> # ZKFC1 retried the connection and considered NN1 was healthy
> {color:red}2014-08-03,23:02:08,292{color} INFO org.apache.hadoop.ipc.Client: 
> Retrying connect to server: xx/xx:13200. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 
> SECONDS)
> # ZKFC1 still considered NN1 as a healthy Active NN, and didn't trigger the 
> failover, as a result, both NNs were standby.
> The root cause of this bug is that NN is restarted too quickly and ZKFC 
> health monitor doesn't realize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6827) Both NameNodes could be in STANDBY State due to HealthMonitor not aware of the target's status changing sometimes

2014-08-18 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101725#comment-14101725
 ] 

Zesheng Wu commented on HDFS-6827:
--

Thanks [~stack].
bq. This issue looks a little ugly. NameNodes stuck in standby mode? This 
production? What did it look like?
Yes, both NameNodes stuck in standby mode, and the hbase cluster over it 
coundn't read/write any more.
We can reproduce the issue in the following way:
1. Change the sleep time of {{Client#handleConnectionFailure()}} longer
{code}
 try {
Thread.sleep(action.delayMillis); // default is 1s, can change to 10s 
or longer
  } catch (InterruptedException e) {
throw (IOException)new InterruptedIOException("Interrupted: action="
+ action + ", retry policy=" + connectionRetryPolicy).initCause(e);
  }
{code}
2. Restart the active NameNode quickly, and ensure that the NameNode starts 
successfully before ZKFC retrying connect.

> Both NameNodes could be in STANDBY State due to HealthMonitor not aware of 
> the target's status changing sometimes
> -
>
> Key: HDFS-6827
> URL: https://issues.apache.org/jira/browse/HDFS-6827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Critical
> Attachments: HDFS-6827.1.patch
>
>
> In our production cluster, we encounter a scenario like this: ANN crashed due 
> to write journal timeout, and was restarted by the watchdog automatically, 
> but after restarting both of the NNs are standby.
> Following is the logs of the scenario:
> # NN1 is down due to write journal timeout:
> {color:red}2014-08-03,23:02:02,219{color} INFO 
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG
> # ZKFC1 detected "connection reset by peer"
> {color:red}2014-08-03,23:02:02,560{color} ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:xx@xx.HADOOP (auth:KERBEROS) cause:java.io.IOException: 
> {color:red}Connection reset by peer{color}
> # NN1 wat restarted successfully by the watchdog:
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Web-server up at: xx:13201
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> {color:red}2014-08-03,23:02:07,884{color} INFO org.apache.hadoop.ipc.Server: 
> IPC Server listener on 13200: starting
> 2014-08-03,23:02:08,742 INFO org.apache.hadoop.ipc.Server: RPC server clean 
> thread started!
> 2014-08-03,23:02:08,743 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Registered DFSClientInformation MBean
> 2014-08-03,23:02:08,744 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> NameNode up at: xx/xx:13200
> 2014-08-03,23:02:08,744 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services 
> required for standby state
> # ZKFC1 retried the connection and considered NN1 was healthy
> {color:red}2014-08-03,23:02:08,292{color} INFO org.apache.hadoop.ipc.Client: 
> Retrying connect to server: xx/xx:13200. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 
> SECONDS)
> # ZKFC1 still considered NN1 as a healthy Active NN, and didn't trigger the 
> failover, as a result, both NNs were standby.
> The root cause of this bug is that NN is restarted too quickly and ZKFC 
> health monitor doesn't realize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6827) NameNode double standby

2014-08-08 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090541#comment-14090541
 ] 

Zesheng Wu commented on HDFS-6827:
--

Hi [~arpitagarwal], can you help review this patch?

> NameNode double standby
> ---
>
> Key: HDFS-6827
> URL: https://issues.apache.org/jira/browse/HDFS-6827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Critical
> Attachments: HDFS-6827.1.patch
>
>
> In our production cluster, we encounter a scenario like this: ANN crashed due 
> to write journal timeout, and was restarted by the watchdog automatically, 
> but after restarting both of the NNs are standby.
> Following is the logs of the scenario:
> # NN1 is down due to write journal timeout:
> {color:red}2014-08-03,23:02:02,219{color} INFO 
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG
> # ZKFC1 detected "connection reset by peer"
> {color:red}2014-08-03,23:02:02,560{color} ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:xx@xx.HADOOP (auth:KERBEROS) cause:java.io.IOException: 
> {color:red}Connection reset by peer{color}
> # NN1 wat restarted successfully by the watchdog:
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Web-server up at: xx:13201
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> {color:red}2014-08-03,23:02:07,884{color} INFO org.apache.hadoop.ipc.Server: 
> IPC Server listener on 13200: starting
> 2014-08-03,23:02:08,742 INFO org.apache.hadoop.ipc.Server: RPC server clean 
> thread started!
> 2014-08-03,23:02:08,743 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Registered DFSClientInformation MBean
> 2014-08-03,23:02:08,744 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> NameNode up at: xx/xx:13200
> 2014-08-03,23:02:08,744 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services 
> required for standby state
> # ZKFC1 retried the connection and considered NN1 was healthy
> {color:red}2014-08-03,23:02:08,292{color} INFO org.apache.hadoop.ipc.Client: 
> Retrying connect to server: xx/xx:13200. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 
> SECONDS)
> # ZKFC1 still considered NN1 as a healthy Active NN, and didn't trigger the 
> failover, as a result, both NNs were standby.
> The root cause of this bug is that NN is restarted too quickly and ZKFC 
> health monitor doesn't realize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6827) NameNode double standby


 [ 
https://issues.apache.org/jira/browse/HDFS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6827:
-

Priority: Critical  (was: Major)

> NameNode double standby
> ---
>
> Key: HDFS-6827
> URL: https://issues.apache.org/jira/browse/HDFS-6827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Critical
> Attachments: HDFS-6827.1.patch
>
>
> In our production cluster, we encounter a scenario like this: ANN crashed due 
> to write journal timeout, and was restarted by the watchdog automatically, 
> but after restarting both of the NNs are standby.
> Following is the logs of the scenario:
> # NN1 is down due to write journal timeout:
> {color:red}2014-08-03,23:02:02,219{color} INFO 
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG
> # ZKFC1 detected "connection reset by peer"
> {color:red}2014-08-03,23:02:02,560{color} ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:xx@xx.HADOOP (auth:KERBEROS) cause:java.io.IOException: 
> {color:red}Connection reset by peer{color}
> # NN1 wat restarted successfully by the watchdog:
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Web-server up at: xx:13201
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> {color:red}2014-08-03,23:02:07,884{color} INFO org.apache.hadoop.ipc.Server: 
> IPC Server listener on 13200: starting
> 2014-08-03,23:02:08,742 INFO org.apache.hadoop.ipc.Server: RPC server clean 
> thread started!
> 2014-08-03,23:02:08,743 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Registered DFSClientInformation MBean
> 2014-08-03,23:02:08,744 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> NameNode up at: xx/xx:13200
> 2014-08-03,23:02:08,744 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services 
> required for standby state
> # ZKFC1 retried the connection and considered NN1 was healthy
> {color:red}2014-08-03,23:02:08,292{color} INFO org.apache.hadoop.ipc.Client: 
> Retrying connect to server: xx/xx:13200. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 
> SECONDS)
> # ZKFC1 still considered NN1 as a healthy Active NN, and didn't trigger the 
> failover, as a result, both NNs were standby.
> The root cause of this bug is that NN is restarted too quickly and ZKFC 
> health monitor doesn't realize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6827) NameNode double standby


[ 
https://issues.apache.org/jira/browse/HDFS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087395#comment-14087395
 ] 

Zesheng Wu commented on HDFS-6827:
--

Submit a patch to fix this bug.
BTW: The {{healthCheckLock}} is used to distinguish the graceful failover and 
the above scenario. 

> NameNode double standby
> ---
>
> Key: HDFS-6827
> URL: https://issues.apache.org/jira/browse/HDFS-6827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6827.1.patch
>
>
> In our production cluster, we encounter a scenario like this: ANN crashed due 
> to write journal timeout, and was restarted by the watchdog automatically, 
> but after restarting both of the NNs are standby.
> Following is the logs of the scenario:
> # NN1 is down due to write journal timeout:
> {color:red}2014-08-03,23:02:02,219{color} INFO 
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG
> # ZKFC1 detected "connection reset by peer"
> {color:red}2014-08-03,23:02:02,560{color} ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:xx@xx.HADOOP (auth:KERBEROS) cause:java.io.IOException: 
> {color:red}Connection reset by peer{color}
> # NN1 wat restarted successfully by the watchdog:
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Web-server up at: xx:13201
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> {color:red}2014-08-03,23:02:07,884{color} INFO org.apache.hadoop.ipc.Server: 
> IPC Server listener on 13200: starting
> 2014-08-03,23:02:08,742 INFO org.apache.hadoop.ipc.Server: RPC server clean 
> thread started!
> 2014-08-03,23:02:08,743 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Registered DFSClientInformation MBean
> 2014-08-03,23:02:08,744 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> NameNode up at: xx/xx:13200
> 2014-08-03,23:02:08,744 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services 
> required for standby state
> # ZKFC1 retried the connection and considered NN1 was healthy
> {color:red}2014-08-03,23:02:08,292{color} INFO org.apache.hadoop.ipc.Client: 
> Retrying connect to server: xx/xx:13200. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 
> SECONDS)
> # ZKFC1 still considered NN1 as a healthy Active NN, and didn't trigger the 
> failover, as a result, both NNs were standby.
> The root cause of this bug is that NN is restarted too quickly and ZKFC 
> health monitor doesn't realize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6827) NameNode double standby


 [ 
https://issues.apache.org/jira/browse/HDFS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6827:
-

Status: Patch Available  (was: Open)

> NameNode double standby
> ---
>
> Key: HDFS-6827
> URL: https://issues.apache.org/jira/browse/HDFS-6827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6827.1.patch
>
>
> In our production cluster, we encounter a scenario like this: ANN crashed due 
> to write journal timeout, and was restarted by the watchdog automatically, 
> but after restarting both of the NNs are standby.
> Following is the logs of the scenario:
> # NN1 is down due to write journal timeout:
> {color:red}2014-08-03,23:02:02,219{color} INFO 
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG
> # ZKFC1 detected "connection reset by peer"
> {color:red}2014-08-03,23:02:02,560{color} ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:xx@xx.HADOOP (auth:KERBEROS) cause:java.io.IOException: 
> {color:red}Connection reset by peer{color}
> # NN1 wat restarted successfully by the watchdog:
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Web-server up at: xx:13201
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> {color:red}2014-08-03,23:02:07,884{color} INFO org.apache.hadoop.ipc.Server: 
> IPC Server listener on 13200: starting
> 2014-08-03,23:02:08,742 INFO org.apache.hadoop.ipc.Server: RPC server clean 
> thread started!
> 2014-08-03,23:02:08,743 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Registered DFSClientInformation MBean
> 2014-08-03,23:02:08,744 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> NameNode up at: xx/xx:13200
> 2014-08-03,23:02:08,744 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services 
> required for standby state
> # ZKFC1 retried the connection and considered NN1 was healthy
> {color:red}2014-08-03,23:02:08,292{color} INFO org.apache.hadoop.ipc.Client: 
> Retrying connect to server: xx/xx:13200. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 
> SECONDS)
> # ZKFC1 still considered NN1 as a healthy Active NN, and didn't trigger the 
> failover, as a result, both NNs were standby.
> The root cause of this bug is that NN is restarted too quickly and ZKFC 
> health monitor doesn't realize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6827) NameNode double standby


 [ 
https://issues.apache.org/jira/browse/HDFS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6827:
-

Attachment: HDFS-6827.1.patch

> NameNode double standby
> ---
>
> Key: HDFS-6827
> URL: https://issues.apache.org/jira/browse/HDFS-6827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.4.1
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6827.1.patch
>
>
> In our production cluster, we encounter a scenario like this: ANN crashed due 
> to write journal timeout, and was restarted by the watchdog automatically, 
> but after restarting both of the NNs are standby.
> Following is the logs of the scenario:
> # NN1 is down due to write journal timeout:
> {color:red}2014-08-03,23:02:02,219{color} INFO 
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG
> # ZKFC1 detected "connection reset by peer"
> {color:red}2014-08-03,23:02:02,560{color} ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:xx@xx.HADOOP (auth:KERBEROS) cause:java.io.IOException: 
> {color:red}Connection reset by peer{color}
> # NN1 wat restarted successfully by the watchdog:
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Web-server up at: xx:13201
> 2014-08-03,23:02:07,884 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> {color:red}2014-08-03,23:02:07,884{color} INFO org.apache.hadoop.ipc.Server: 
> IPC Server listener on 13200: starting
> 2014-08-03,23:02:08,742 INFO org.apache.hadoop.ipc.Server: RPC server clean 
> thread started!
> 2014-08-03,23:02:08,743 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Registered DFSClientInformation MBean
> 2014-08-03,23:02:08,744 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> NameNode up at: xx/xx:13200
> 2014-08-03,23:02:08,744 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services 
> required for standby state
> # ZKFC1 retried the connection and considered NN1 was healthy
> {color:red}2014-08-03,23:02:08,292{color} INFO org.apache.hadoop.ipc.Client: 
> Retrying connect to server: xx/xx:13200. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 
> SECONDS)
> # ZKFC1 still considered NN1 as a healthy Active NN, and didn't trigger the 
> failover, as a result, both NNs were standby.
> The root cause of this bug is that NN is restarted too quickly and ZKFC 
> health monitor doesn't realize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6827) NameNode double standby

2014-08-05 Thread Zesheng Wu (JIRA)

Zesheng Wu created HDFS-6827:


 Summary: NameNode double standby
 Key: HDFS-6827
 URL: https://issues.apache.org/jira/browse/HDFS-6827
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.4.1
Reporter: Zesheng Wu
Assignee: Zesheng Wu


In our production cluster, we encounter a scenario like this: ANN crashed due 
to write journal timeout, and was restarted by the watchdog automatically, but 
after restarting both of the NNs are standby.

Following is the logs of the scenario:
# NN1 is down due to write journal timeout:
{color:red}2014-08-03,23:02:02,219{color} INFO 
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG
# ZKFC1 detected "connection reset by peer"
{color:red}2014-08-03,23:02:02,560{color} ERROR 
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
as:xx@xx.HADOOP (auth:KERBEROS) cause:java.io.IOException: 
{color:red}Connection reset by peer{color}
# NN1 wat restarted successfully by the watchdog:
2014-08-03,23:02:07,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
Web-server up at: xx:13201
2014-08-03,23:02:07,884 INFO org.apache.hadoop.ipc.Server: IPC Server 
Responder: starting
{color:red}2014-08-03,23:02:07,884{color} INFO org.apache.hadoop.ipc.Server: 
IPC Server listener on 13200: starting
2014-08-03,23:02:08,742 INFO org.apache.hadoop.ipc.Server: RPC server clean 
thread started!
2014-08-03,23:02:08,743 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
Registered DFSClientInformation MBean
2014-08-03,23:02:08,744 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
NameNode up at: xx/xx:13200
2014-08-03,23:02:08,744 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required 
for standby state
# ZKFC1 retried the connection and considered NN1 was healthy
{color:red}2014-08-03,23:02:08,292{color} INFO org.apache.hadoop.ipc.Client: 
Retrying connect to server: xx/xx:13200. Already tried 0 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)
# ZKFC1 still considered NN1 as a healthy Active NN, and didn't trigger the 
failover, as a result, both NNs were standby.

The root cause of this bug is that NN is restarted too quickly and ZKFC health 
monitor doesn't realize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6526) Implement HDFS TtlManager

2014-07-25 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075238#comment-14075238
 ] 

Zesheng Wu commented on HDFS-6526:
--

Thanks for feedback, [~daryn].
# About scalability and performance issues: currently the default 
implementation just simply traverses the whole directory tree, we can improve 
this to that each instance traverses a sub directory tree, and can deploy 
multiple instances on demand.
# About bugs: can you point out in details, I would like to figure them out as 
soon as possible.
# bq. Would you please elaborate on whether you plan to simply have the trash 
emptier and ttl manager run as distinct services in the same adjunct process? 
Or do you plan on the emptier actually leveraging/relying on ttls?
As I mentioned in the design document in HDFS-6382, we want to supply a general 
mechanism that can run various kinds of policies on the namespace, for example 
TTL is one of the policies which is intended to clean expired files and 
directories.  In the same way, we can extend to implement a trash policy to 
work in the way that is the same as the current trash emptier. If folks come to 
an agreement that move the trash emptier is reasonable, we will change 
'TtlManager' to a more general name.
# bq. As a general comment, a feature like this is attractive but may be highly 
dangerous. You many want to consider means to safeguard against a severely 
skewed system clock, else the ttl manager might go on mass murder spree in the 
filesystem
Yes, I agree with you on this very much. The current patch is just an initial 
implementation, we will improve to support this in the future iteration.
Hope that I expressed myself clearly:)

> Implement HDFS TtlManager
> -
>
> Key: HDFS-6526
> URL: https://issues.apache.org/jira/browse/HDFS-6526
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6526.1.patch
>
>
> This issue is used to track development of HDFS TtlManager, for details see 
> HDFS-6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-07-18 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14067254#comment-14067254
 ] 

Zesheng Wu commented on HDFS-6382:
--

Thanks [~cutting]
I will move the trash emptier into TTL daemon after HDFS-6525 and HDFS-6526 are 
resolved.

> HDFS File/Directory TTL
> ---
>
> Key: HDFS-6382
> URL: https://issues.apache.org/jira/browse/HDFS-6382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-TTL-Design -2.pdf, HDFS-TTL-Design-3.pdf, 
> HDFS-TTL-Design.pdf
>
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6596) Improve InputStream when read spans two blocks

2014-06-30 Thread Zesheng Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6596:
-

Attachment: HDFS-6596.3.patch

> Improve InputStream when read spans two blocks
> --
>
> Key: HDFS-6596
> URL: https://issues.apache.org/jira/browse/HDFS-6596
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6596.1.patch, HDFS-6596.2.patch, HDFS-6596.2.patch, 
> HDFS-6596.2.patch, HDFS-6596.3.patch, HDFS-6596.3.patch
>
>
> In the current implementation of DFSInputStream, read(buffer, offset, length) 
> is implemented as following:
> {code}
> int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
> if (locatedBlocks.isLastBlockComplete()) {
>   realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
> }
> int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
> {code}
> From the above code, we can conclude that the read will return at most 
> (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the 
> caller must call read() second time to complete the request, and must wait 
> second time to acquire the DFSInputStream lock(read() is synchronized for 
> DFSInputStream). For latency sensitive applications, such as hbase, this will 
> result in latency pain point when they under massive race conditions. So here 
> we propose that we should loop internally in read() to do best effort read.
> In the current implementation of pread(read(position, buffer, offset, 
> lenght)), it does loop internally to do best effort read. So we can refactor 
> to support this on normal read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6596) Improve InputStream when read spans two blocks

2014-06-30 Thread Zesheng Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6596:
-

Attachment: HDFS-6596.3.patch

Fixed javadoc warnings.

> Improve InputStream when read spans two blocks
> --
>
> Key: HDFS-6596
> URL: https://issues.apache.org/jira/browse/HDFS-6596
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6596.1.patch, HDFS-6596.2.patch, HDFS-6596.2.patch, 
> HDFS-6596.2.patch, HDFS-6596.3.patch
>
>
> In the current implementation of DFSInputStream, read(buffer, offset, length) 
> is implemented as following:
> {code}
> int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
> if (locatedBlocks.isLastBlockComplete()) {
>   realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
> }
> int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
> {code}
> From the above code, we can conclude that the read will return at most 
> (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the 
> caller must call read() second time to complete the request, and must wait 
> second time to acquire the DFSInputStream lock(read() is synchronized for 
> DFSInputStream). For latency sensitive applications, such as hbase, this will 
> result in latency pain point when they under massive race conditions. So here 
> we propose that we should loop internally in read() to do best effort read.
> In the current implementation of pread(read(position, buffer, offset, 
> lenght)), it does loop internally to do best effort read. So we can refactor 
> to support this on normal read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6596) Improve InputStream when read spans two blocks

2014-06-29 Thread Zesheng Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6596:
-

Attachment: HDFS-6596.2.patch

Mmm, Unrelated test failure, just resubmit...

> Improve InputStream when read spans two blocks
> --
>
> Key: HDFS-6596
> URL: https://issues.apache.org/jira/browse/HDFS-6596
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6596.1.patch, HDFS-6596.2.patch, HDFS-6596.2.patch, 
> HDFS-6596.2.patch
>
>
> In the current implementation of DFSInputStream, read(buffer, offset, length) 
> is implemented as following:
> {code}
> int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
> if (locatedBlocks.isLastBlockComplete()) {
>   realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
> }
> int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
> {code}
> From the above code, we can conclude that the read will return at most 
> (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the 
> caller must call read() second time to complete the request, and must wait 
> second time to acquire the DFSInputStream lock(read() is synchronized for 
> DFSInputStream). For latency sensitive applications, such as hbase, this will 
> result in latency pain point when they under massive race conditions. So here 
> we propose that we should loop internally in read() to do best effort read.
> In the current implementation of pread(read(position, buffer, offset, 
> lenght)), it does loop internally to do best effort read. So we can refactor 
> to support this on normal read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6596) Improve InputStream when read spans two blocks

2014-06-29 Thread Zesheng Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6596:
-

Attachment: HDFS-6596.2.patch

Failed test is unrelated, just resubmit the patch to trigger Jenkins.

> Improve InputStream when read spans two blocks
> --
>
> Key: HDFS-6596
> URL: https://issues.apache.org/jira/browse/HDFS-6596
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6596.1.patch, HDFS-6596.2.patch, HDFS-6596.2.patch
>
>
> In the current implementation of DFSInputStream, read(buffer, offset, length) 
> is implemented as following:
> {code}
> int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
> if (locatedBlocks.isLastBlockComplete()) {
>   realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
> }
> int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
> {code}
> From the above code, we can conclude that the read will return at most 
> (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the 
> caller must call read() second time to complete the request, and must wait 
> second time to acquire the DFSInputStream lock(read() is synchronized for 
> DFSInputStream). For latency sensitive applications, such as hbase, this will 
> result in latency pain point when they under massive race conditions. So here 
> we propose that we should loop internally in read() to do best effort read.
> In the current implementation of pread(read(position, buffer, offset, 
> lenght)), it does loop internally to do best effort read. So we can refactor 
> to support this on normal read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6596) Improve InputStream when read spans two blocks

2014-06-27 Thread Zesheng Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6596:
-

Attachment: HDFS-6596.2.patch

Uploaded a polished version.

> Improve InputStream when read spans two blocks
> --
>
> Key: HDFS-6596
> URL: https://issues.apache.org/jira/browse/HDFS-6596
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6596.1.patch, HDFS-6596.2.patch
>
>
> In the current implementation of DFSInputStream, read(buffer, offset, length) 
> is implemented as following:
> {code}
> int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
> if (locatedBlocks.isLastBlockComplete()) {
>   realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
> }
> int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
> {code}
> From the above code, we can conclude that the read will return at most 
> (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the 
> caller must call read() second time to complete the request, and must wait 
> second time to acquire the DFSInputStream lock(read() is synchronized for 
> DFSInputStream). For latency sensitive applications, such as hbase, this will 
> result in latency pain point when they under massive race conditions. So here 
> we propose that we should loop internally in read() to do best effort read.
> In the current implementation of pread(read(position, buffer, offset, 
> lenght)), it does loop internally to do best effort read. So we can refactor 
> to support this on normal read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6596) Improve InputStream when read spans two blocks

2014-06-25 Thread Zesheng Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6596:
-

Status: Patch Available  (was: Open)

> Improve InputStream when read spans two blocks
> --
>
> Key: HDFS-6596
> URL: https://issues.apache.org/jira/browse/HDFS-6596
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6596.1.patch
>
>
> In the current implementation of DFSInputStream, read(buffer, offset, length) 
> is implemented as following:
> {code}
> int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
> if (locatedBlocks.isLastBlockComplete()) {
>   realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
> }
> int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
> {code}
> From the above code, we can conclude that the read will return at most 
> (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the 
> caller must call read() second time to complete the request, and must wait 
> second time to acquire the DFSInputStream lock(read() is synchronized for 
> DFSInputStream). For latency sensitive applications, such as hbase, this will 
> result in latency pain point when they under massive race conditions. So here 
> we propose that we should loop internally in read() to do best effort read.
> In the current implementation of pread(read(position, buffer, offset, 
> lenght)), it does loop internally to do best effort read. So we can refactor 
> to support this on normal read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6596) Improve InputStream when read spans two blocks

2014-06-25 Thread Zesheng Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6596:
-

Attachment: HDFS-6596.1.patch

An initial patch.
Since {{DataInputStream#readfully(byte[], int, int)}} is final and 
{{FSDataInputStream}} can't override it, so we implement a readFully with 
ByteBuffer.

> Improve InputStream when read spans two blocks
> --
>
> Key: HDFS-6596
> URL: https://issues.apache.org/jira/browse/HDFS-6596
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6596.1.patch
>
>
> In the current implementation of DFSInputStream, read(buffer, offset, length) 
> is implemented as following:
> {code}
> int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
> if (locatedBlocks.isLastBlockComplete()) {
>   realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
> }
> int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
> {code}
> From the above code, we can conclude that the read will return at most 
> (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the 
> caller must call read() second time to complete the request, and must wait 
> second time to acquire the DFSInputStream lock(read() is synchronized for 
> DFSInputStream). For latency sensitive applications, such as hbase, this will 
> result in latency pain point when they under massive race conditions. So here 
> we propose that we should loop internally in read() to do best effort read.
> In the current implementation of pread(read(position, buffer, offset, 
> lenght)), it does loop internally to do best effort read. So we can refactor 
> to support this on normal read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6596) Improve InputStream when read spans two blocks

2014-06-24 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042986#comment-14042986
 ] 

Zesheng Wu commented on HDFS-6596:
--

Thanks Colin.
bq.  What you are proposing is basically making every {{read}} into a 
{{readFully}}. I don't think we want to increase the number of differences 
between how DFSInputStream works and how "normal" Java input streams work. The 
"normal" java behavior also has a good reason behind it... clients who can deal 
with partial reads will get a faster response time if the stream just returns 
what it can rather than waiting for everything. In the case of HDFS, waiting 
for everything might mean connecting to a remote DataNode. This could be quite 
a lot of latency.
I agree with you that we shouldn't make every {{read}} into a {{readFully}}, 
and the current implementation of {{read}} has its advantage as you described.

About the solution, I think that we do it in Hadoop will be better, because all 
users will be benefited.
The current {{readFully}} for DFSInputStream is implemented as pread and 
inherits from FSInputStream, so I will a new {{readFully(buffer, offset, 
length)}} to figure this out.  Any thoughts?

> Improve InputStream when read spans two blocks
> --
>
> Key: HDFS-6596
> URL: https://issues.apache.org/jira/browse/HDFS-6596
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> In the current implementation of DFSInputStream, read(buffer, offset, length) 
> is implemented as following:
> {code}
> int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
> if (locatedBlocks.isLastBlockComplete()) {
>   realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
> }
> int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
> {code}
> From the above code, we can conclude that the read will return at most 
> (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the 
> caller must call read() second time to complete the request, and must wait 
> second time to acquire the DFSInputStream lock(read() is synchronized for 
> DFSInputStream). For latency sensitive applications, such as hbase, this will 
> result in latency pain point when they under massive race conditions. So here 
> we propose that we should loop internally in read() to do best effort read.
> In the current implementation of pread(read(position, buffer, offset, 
> lenght)), it does loop internally to do best effort read. So we can refactor 
> to support this on normal read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6596) Improve InputStream when read spans two blocks

Zesheng Wu created HDFS-6596:


 Summary: Improve InputStream when read spans two blocks
 Key: HDFS-6596
 URL: https://issues.apache.org/jira/browse/HDFS-6596
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu


In the current implementation of DFSInputStream, read(buffer, offset, length) 
is implemented as following:
{code}
int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
if (locatedBlocks.isLastBlockComplete()) {
  realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
}
int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
{code}
>From the above code, we can conclude that the read will return at most 
>(blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the 
>caller must call read() second time to complete the request, and must wait 
>second time to acquire the DFSInputStream lock(read() is synchronized for 
>DFSInputStream). For latency sensitive applications, such as hbase, this will 
>result in latency pain point when they under massive race conditions. So here 
>we propose that we should loop internally in read() to do best effort read.

In the current implementation of pread(read(position, buffer, offset, lenght)), 
it does loop internally to do best effort read. So we can refactor to support 
this on normal read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6525) FsShell supports HDFS TTL


 [ 
https://issues.apache.org/jira/browse/HDFS-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6525:
-

Attachment: HDFS-6525.2.patch

Updated to address [~daryn]'s comments.

BTW: Daryn, the path is only printed when the ttl is inherited, so I will keep 
the format 'ttl : [path]".

> FsShell supports HDFS TTL
> -
>
> Key: HDFS-6525
> URL: https://issues.apache.org/jira/browse/HDFS-6525
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6525.1.patch, HDFS-6525.2.patch
>
>
> This issue is used to track development of supporting  HDFS TTL for FsShell, 
> for details see HDFS-6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6525) FsShell supports HDFS TTL


[ 
https://issues.apache.org/jira/browse/HDFS-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041572#comment-14041572
 ] 

Zesheng Wu commented on HDFS-6525:
--

Thanks [~daryn], I will update the patch to address your comments immediately.

> FsShell supports HDFS TTL
> -
>
> Key: HDFS-6525
> URL: https://issues.apache.org/jira/browse/HDFS-6525
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6525.1.patch
>
>
> This issue is used to track development of supporting  HDFS TTL for FsShell, 
> for details see HDFS-6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL


[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040683#comment-14040683
 ] 

Zesheng Wu commented on HDFS-6382:
--

Hi guys, I've uploaded an initial implementation on HDFS-6525 and HDFS-6526 
separately, hope you can take a look at, any comments will be appreciated. 
Thanks in advance.

> HDFS File/Directory TTL
> ---
>
> Key: HDFS-6382
> URL: https://issues.apache.org/jira/browse/HDFS-6382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-TTL-Design -2.pdf, HDFS-TTL-Design-3.pdf, 
> HDFS-TTL-Design.pdf
>
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6526) Implement HDFS TtlManager


 [ 
https://issues.apache.org/jira/browse/HDFS-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6526:
-

Target Version/s: 2.5.0
  Status: Patch Available  (was: Open)

> Implement HDFS TtlManager
> -
>
> Key: HDFS-6526
> URL: https://issues.apache.org/jira/browse/HDFS-6526
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6526.1.patch
>
>
> This issue is used to track development of HDFS TtlManager, for details see 
> HDFS-6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6526) Implement HDFS TtlManager


 [ 
https://issues.apache.org/jira/browse/HDFS-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6526:
-

Attachment: HDFS-6526.1.patch

Initial implementation, the unit test depends on HDFS-6525, so we should commit 
HDFS-6525 before commit this.

> Implement HDFS TtlManager
> -
>
> Key: HDFS-6526
> URL: https://issues.apache.org/jira/browse/HDFS-6526
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6526.1.patch
>
>
> This issue is used to track development of HDFS TtlManager, for details see 
> HDFS-6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6525) FsShell supports HDFS TTL


 [ 
https://issues.apache.org/jira/browse/HDFS-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6525:
-

Target Version/s: 2.5.0
  Status: Patch Available  (was: Open)

> FsShell supports HDFS TTL
> -
>
> Key: HDFS-6525
> URL: https://issues.apache.org/jira/browse/HDFS-6525
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6525.1.patch
>
>
> This issue is used to track development of supporting  HDFS TTL for FsShell, 
> for details see HDFS-6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6525) FsShell supports HDFS TTL


 [ 
https://issues.apache.org/jira/browse/HDFS-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6525:
-

Attachment: HDFS-6525.1.patch

Initial implementation.

> FsShell supports HDFS TTL
> -
>
> Key: HDFS-6525
> URL: https://issues.apache.org/jira/browse/HDFS-6525
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6525.1.patch
>
>
> This issue is used to track development of supporting  HDFS TTL for FsShell, 
> for details see HDFS-6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6382) HDFS File/Directory TTL


 [ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6382:
-

Attachment: HDFS-TTL-Design-3.pdf

Update the document according to the implementation.

> HDFS File/Directory TTL
> ---
>
> Key: HDFS-6382
> URL: https://issues.apache.org/jira/browse/HDFS-6382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-TTL-Design -2.pdf, HDFS-TTL-Design-3.pdf, 
> HDFS-TTL-Design.pdf
>
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

2014-06-22 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040435#comment-14040435
 ] 

Zesheng Wu commented on HDFS-6507:
--

Thanks [~vinayrpet] for reviewing the patch.
Thanks all for feedback.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Fix For: 2.5.0
>
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, 
> HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch, 
> HDFS-6507.6.patch, HDFS-6507.7.patch, HDFS-6507.7.patch, HDFS-6507.8.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038235#comment-14038235
 ] 

Zesheng Wu commented on HDFS-6507:
--

Hi [~vinayrpet], it seems that folks do not have other opinions.
Can you help to commit the patch?

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, 
> HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch, 
> HDFS-6507.6.patch, HDFS-6507.7.patch, HDFS-6507.7.patch, HDFS-6507.8.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037078#comment-14037078
 ] 

Zesheng Wu commented on HDFS-6507:
--

Since it still has some potential value when used in clusters with multiple 
namespaces in the future, we keep it now?

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, 
> HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch, 
> HDFS-6507.6.patch, HDFS-6507.7.patch, HDFS-6507.7.patch, HDFS-6507.8.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037069#comment-14037069
 ] 

Zesheng Wu commented on HDFS-6507:
--

I eventually catch your point:)
But I think ":8020" is still better than null, especially for 
cluster with multiple namespaces, how do you think?

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, 
> HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch, 
> HDFS-6507.6.patch, HDFS-6507.7.patch, HDFS-6507.7.patch, HDFS-6507.8.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037058#comment-14037058
 ] 

Zesheng Wu commented on HDFS-6507:
--

If we remove this, the log message will like this "Refresh xx for null 
success", right?

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, 
> HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch, 
> HDFS-6507.6.patch, HDFS-6507.7.patch, HDFS-6507.7.patch, HDFS-6507.8.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

2014-06-18 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037044#comment-14037044
 ] 

Zesheng Wu commented on HDFS-6507:
--

You mean that we should revert the change to ProxyAndInfo and remove addresses 
in logs messages also?

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, 
> HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch, 
> HDFS-6507.6.patch, HDFS-6507.7.patch, HDFS-6507.7.patch, HDFS-6507.8.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

2014-06-18 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036720#comment-14036720
 ] 

Zesheng Wu commented on HDFS-6507:
--

Thanks for feedback [~jingzhao]

I still think that we should send requests to all NNs. If people thinks some 
commands are heavy(just as you mentioned),  they can use the {{-fs}} option to 
specify a NN to do his operation. 
As the current default behavior of the commands(setSafeMode/metaSave, etc) 
doesn't meet our expectation(you also think so),  we should correct it. The way 
you suggested is just a compromise, and it's easier to be friendly for people, 
by not easier to other tools who use the DFSAdmin shell.

How do you think?

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, 
> HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch, 
> HDFS-6507.6.patch, HDFS-6507.7.patch, HDFS-6507.7.patch, HDFS-6507.8.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034891#comment-14034891
 ] 

Zesheng Wu commented on HDFS-6507:
--

The failed test doesn't relate to this issue.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, 
> HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch, 
> HDFS-6507.6.patch, HDFS-6507.7.patch, HDFS-6507.7.patch, HDFS-6507.8.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better


 [ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6507:
-

Attachment: HDFS-6507.8.patch

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, 
> HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch, 
> HDFS-6507.6.patch, HDFS-6507.7.patch, HDFS-6507.7.patch, HDFS-6507.8.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034856#comment-14034856
 ] 

Zesheng Wu commented on HDFS-6507:
--

Yes, I will figure it out immediately

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, 
> HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch, 
> HDFS-6507.6.patch, HDFS-6507.7.patch, HDFS-6507.7.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better


 [ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6507:
-

Attachment: HDFS-6507.7.patch

Failed tests don't relate to this issue, just resubmit the #7 patch to trigger 
the jenkins.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, 
> HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch, 
> HDFS-6507.6.patch, HDFS-6507.7.patch, HDFS-6507.7.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better


 [ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6507:
-

Attachment: HDFS-6507.7.patch

Fix breaked test.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, 
> HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch, 
> HDFS-6507.6.patch, HDFS-6507.7.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better


 [ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6507:
-

Attachment: HDFS-6507.6.patch

The failed test {{TestCrcCorruption}} doesn't relate to this issue
I run it on my local machine and passed
The javadoc warning is also weird, I run the mvn command on my local machine, 
and there's no such warning.

Just resubmit the patch, and trigger the Jenkins.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, 
> HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch, 
> HDFS-6507.6.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033725#comment-14033725
 ] 

Zesheng Wu commented on HDFS-6507:
--

It seems that finalizeUpgrade() doesn't output any prompt messages, you mean 
that we should remove the printed messages?

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, 
> HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better


 [ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6507:
-

Attachment: HDFS-6507.5.patch

Some minor polishes.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, 
> HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033754#comment-14033754
 ] 

Zesheng Wu commented on HDFS-6507:
--

Mmm, maybe in this case, user can use {{-fs}} generic option to specify a NN to 
operate?

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, 
> HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033738#comment-14033738
 ] 

Zesheng Wu commented on HDFS-6507:
--

bq. One more query I have.
using this implementation we can execute commands successfully when all 
namenodes of a nameservice are up and running.
But what if standby nodes are down for maintainance and these comes first in 
configurations...?

>From my understanding, in this case we can just fail the operation, and users 
>can retry after standby nodes are up.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, 
> HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033737#comment-14033737
 ] 

Zesheng Wu commented on HDFS-6507:
--

OK, let me figure these too out:)

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, 
> HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better


 [ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6507:
-

Attachment: HDFS-6507.4.patch

Updated according to Vinay's comments.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch, 
> HDFS-6507.4.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033693#comment-14033693
 ] 

Zesheng Wu commented on HDFS-6507:
--

Oh, It seems that NameNode also has a {{ getAddress(URI filesystemURI)}}, this 
will work?

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033688#comment-14033688
 ] 

Zesheng Wu commented on HDFS-6507:
--

I checked the related code, ProxyAndInfo is used in 3 places: 
{{NameNodeProxies#createProxyWithLossyRetryHandler}}, 
{{NameNodeProxies#createProxy}}, 
{{NameNodeProxies#createProxyWithLossyRetryHandler}}, in the frist place we can 
obtain NN address directly, but in the last two places, we can not obtain NN's 
address directly, we only have NN's URI. In non-HA case, we can get NN address 
by {{ NameNode.getAddress(nameNodeUri)}}, by in HA case, it seems not easy. How 
do you think?

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better


 [ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6507:
-

Attachment: HDFS-6507.3.patch

New patch addressed Vinay's review comments.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

2014-06-16 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033480#comment-14033480
 ] 

Zesheng Wu commented on HDFS-6507:
--

Thanks [~vinayrpet] for reviewing the patch, all comments are reasonable to me, 
I will generate a new patch soon to address your comments.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-15 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032082#comment-14032082
 ] 

Zesheng Wu commented on HDFS-6382:
--

[~ste...@apache.org] Thanks for your feedback.
We have discussed that whether to use a MR job or a standalone daemon, and most 
people upstream has come to an agreement that a standalone daemon is reasonable 
and acceptable. You can go through the earlier discussion.  

[~aw] Thanks for your feedback.
Your suggestion is really valuable and firms our confidence to implement it as 
a standalone daemon.

> HDFS File/Directory TTL
> ---
>
> Key: HDFS-6382
> URL: https://issues.apache.org/jira/browse/HDFS-6382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-TTL-Design -2.pdf, HDFS-TTL-Design.pdf
>
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better


 [ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6507:
-

Attachment: HDFS-6507.2.patch

Fix breaked tests.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030507#comment-14030507
 ] 

Zesheng Wu commented on HDFS-6507:
--

Mmm, seems that some test failed, I will figure it out soon.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6526) Implement HDFS TtlManager


 [ 
https://issues.apache.org/jira/browse/HDFS-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6526:
-

Description: This issue is used to track development of HDFS TtlManager, 
for details see HDFS-6382.  (was: This issue is used to track development of 
HDFS TtlManager, for details see HDFS -6382.)

> Implement HDFS TtlManager
> -
>
> Key: HDFS-6526
> URL: https://issues.apache.org/jira/browse/HDFS-6526
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> This issue is used to track development of HDFS TtlManager, for details see 
> HDFS-6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL


[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030366#comment-14030366
 ] 

Zesheng Wu commented on HDFS-6382:
--

I filed two sub-tasks to track the development of this feature.

> HDFS File/Directory TTL
> ---
>
> Key: HDFS-6382
> URL: https://issues.apache.org/jira/browse/HDFS-6382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-TTL-Design -2.pdf, HDFS-TTL-Design.pdf
>
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6526) Implement HDFS TtlManager

Zesheng Wu created HDFS-6526:


 Summary: Implement HDFS TtlManager
 Key: HDFS-6526
 URL: https://issues.apache.org/jira/browse/HDFS-6526
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu


This issue is used to track development of HDFS TtlManager, for details see 
HDFS -6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6525) FsShell supports HDFS TTL

Zesheng Wu created HDFS-6525:


 Summary: FsShell supports HDFS TTL
 Key: HDFS-6525
 URL: https://issues.apache.org/jira/browse/HDFS-6525
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, tools
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu


This issue is used to track development of supporting  HDFS TTL for FsShell, 
for details see HDFS-6382.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better


 [ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6507:
-

Tags: dfsadmin
Target Version/s: 3.0.0, 2.5.0  (was: 3.0.0)
  Status: Patch Available  (was: Open)

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better


 [ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6507:
-

Attachment: HDFS-6507.1.patch

Attached initial version of the implementation.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

2014-06-12 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029025#comment-14029025
 ] 

Zesheng Wu commented on HDFS-6507:
--

OK, I'll go ahead for implementation. 

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

2014-06-12 Thread Zesheng Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028986#comment-14028986
 ] 

Zesheng Wu commented on HDFS-6507:
--

[~vinayrpet], Thanks for your feedback.
bq. Only worry is, if commands pass on one namenode and fails on another 
namenode, how to handle the failures.? whether just log the errors or rollback?
>From what I know so far, all the operations mentioned in the description are 
>idempotent, so when a command fails on one(some) of the NNs, users can just 
>retry the command. What we need to do is report the error clearly to users, 
>either by logs or exit codes. Rollback is too complex.

bq. If manual operations required before command execution, then its user's 
responsibility to make sure that configurations updated at both namenodes 
before command execution.
The manual operations required before command execution can be done by tools in 
batches, for example, if users want to do refreshNodes, they can upload new 
excludes/includes file to all of the NNs without caring which NN is Active or 
Standby. After uploading the files, they just run refreshNodes command, it's 
enough.

bq. Only Active leave safemode, which on failover again puts the cluster into 
safemode.
If we decide to send set safemode for both NN, we should let the user know 
whether there's failure clearly. If there's failure, users should do retry 
until all succeed.


> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028870#comment-14028870
 ] 

Zesheng Wu commented on HDFS-6507:
--

Fix one mistake:
bq. The existing refresh commands do not accept a host:port parameter. 
As [~jingzhao] mentioned, the generic option {{-fs}} can specify host:port for 
those commands.

To summarize, the main purpose of this jira is listed below:
# Make the default action correctly and naturally: a. Commands which should 
take effect on ANN, will take effect on ANN by default; b.   Commands which 
should take effect on both ANN and SNN, will take effect on both ANN and SNN by 
default
# Improve usability: a. Users do not need to care that which NN is active and 
what the NN's host:port is, just run the command; b. Commands that should take 
effect on both ANN and SNN needn't run twice with respective host:port.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028843#comment-14028843
 ] 

Zesheng Wu commented on HDFS-6507:
--

[~jingzhao], Thanks for feedback.
I've checked HDFS-5147. I think -fs can only solve part of the problems, not 
all. This jira is intended to solve the problem thoroughly.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028814#comment-14028814
 ] 

Zesheng Wu commented on HDFS-6507:
--

bq. I don't see a significant advantage of having a --all or similar flag to 
the refresh commands, as replacing configs is still manual. Do the existing 
refresh commands accept a host:port parameter? If not we can probably add one.
The existing refresh commands do not accept a host:port parameter. I donot 
propose to add a {{--all}} or similiar flags, but to send the refresh commands 
to all NNs. This modification is transparent for current usage, just an 
improvement.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028777#comment-14028777
 ] 

Zesheng Wu commented on HDFS-6507:
--

[~chrili], Thanks for your feedback.
Waiting for [~wheat9] and [~arpitagarwal]'s opinions. If other folks have any 
suggestion, please feel free to feedback:)


> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL


[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028768#comment-14028768
 ] 

Zesheng Wu commented on HDFS-6382:
--

[~szetszwo], Thanks for your valuable suggestions.
bq. Using xattrs for TTL is a good idea. Do we really need ttl in milliseconds? 
Do you think that the daemon could guarantee such accuracy? We don't want to 
waste namenode memory space to store trailing zeros/digits for each ttl. How 
about supporting symbolic ttl notation, e.g. 10h, 5d?
Yes, I agree with you that the daemon can't guarantee milliseconds accuracy, 
and in fact there's no need to guarantee such accuracy. As you suggested, we 
can use encoded bytes to save NN's memory.

bq. The name "Supervisor" sounds too general. How about calling it "TtlManager" 
for the moment? If there are more new features added to the tool, we may change 
the name later.
OK, "TtlManager" is more suitable for the moment.

bq. For setting ttl on a directory foo, write permission permission on the 
parent directory of foo is not enough. Namenode also checks rwx for all 
subdirectories of foo for recursive delete. 
Nice catch, If we want to conform to the delete semantics mentioned by Colin, 
we should check the subdirectories recursively.

bq. BTW, permission could be changed from time to time. A user may be able to 
delete a file/dir at the time of setting TTL but the same user may not have 
permission to delete the same file/dir when the ttl expires.
The deleting work will be done by a super user(which the "TtlManager" runs as), 
seems this is not a problem?

bq. I suggest not to check additional permission requirement on setting ttl but 
run as the particular user when deleting the file. Then we need to add username 
to the ttl xattr.
Good point, but adding the username to the ttl xattr requires more space of 
NN's memory, we should do the trade-off whether it's worth doing.


> HDFS File/Directory TTL
> ---
>
> Key: HDFS-6382
> URL: https://issues.apache.org/jira/browse/HDFS-6382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-TTL-Design -2.pdf, HDFS-TTL-Design.pdf
>
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028748#comment-14028748
 ] 

Zesheng Wu commented on HDFS-6507:
--

Since [~arpitagarwal] and [~chrili] are leading the development of 
HADOOP-10376, I would like to ask for you guys' idea about (2).  Any 
suggestions are welcome and will be very appreciated. Thanks in advance.

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better


[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028732#comment-14028732
 ] 

Zesheng Wu commented on HDFS-6507:
--

HADOOP-10376 implements a refresh command like this {{[-refresh  
 \[arg1..argn\]}}, users should specify the {{host:ipc_port}} when they 
want to run the refresh command. From the point of of view functionality, it's 
enough; but from the point of view of usability, refresh both NNs transparently 
is more friendly for users.

Because HADOOP-10376 just supplies a new way of refreshing, and the old way 
still exists and will exist for a long time for backward compatibility, the (2) 
still has value to do, how do you think about this? [~wheat9]

> Improve DFSAdmin to support HA cluster better
> -
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6503) Fix typo of DFSAdmin restoreFailedStorage


[ 
https://issues.apache.org/jira/browse/HDFS-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028646#comment-14028646
 ] 

Zesheng Wu commented on HDFS-6503:
--

[~wheat9] Thank you for reviewing the patch:)

> Fix typo of DFSAdmin restoreFailedStorage
> -
>
> Key: HDFS-6503
> URL: https://issues.apache.org/jira/browse/HDFS-6503
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Minor
> Fix For: 2.5.0
>
> Attachments: HDFS-6503.patch
>
>
> Fix typo: restoreFaileStorage should be restoreFailedStorage



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6382) HDFS File/Directory TTL


 [ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6382:
-

Attachment: HDFS-TTL-Design -2.pdf

Updated the documents to address Colin's suggestions.
Thanks Colin for your valuable suggestions:)

> HDFS File/Directory TTL
> ---
>
> Key: HDFS-6382
> URL: https://issues.apache.org/jira/browse/HDFS-6382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-TTL-Design -2.pdf, HDFS-TTL-Design.pdf
>
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL


[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027325#comment-14027325
 ] 

Zesheng Wu commented on HDFS-6382:
--

bq. Even if it's not implemented at first, we should think about the 
configuration required here. I think we want the ability to email the admins 
when things go wrong. Possibly the notifier could be pluggable or have several 
policies. There was nothing in the doc about configuration in general, which I 
think we need to fix. For example, how is rate limiting configurable? How do we 
notify admins that the rate is too slow to finish in the time given?
OK, I will update the document and post a new version soon.

bq. You can't delete a file in HDFS unless you have write permission on the 
containing directory. Whether you have write permission on the file itself is 
not relevant. So I would expect the same semantics here (probably enforced by 
setfacl itself).
That's reasonable, I'll figure it out clearly in the document.

> HDFS File/Directory TTL
> ---
>
> Key: HDFS-6382
> URL: https://issues.apache.org/jira/browse/HDFS-6382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-TTL-Design.pdf
>
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zesheng Wu updated HDFS-6507:
-

Description:
Currently, the commands supported in DFSAdmin can be classified into three
categories according to the protocol used:
1. ClientProtocol
Commands in this category generally implement by calling the corresponding
function of the DFSClient class, and will call the corresponding remote
implementation function at the NN side finally. At the NN side, all these
operations are classified into five categories: UNCHECKED, READ, WRITE,
CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only
allows UNCHECKED operations. In the current implementation of DFSClient, it
will connect one NN first, if the first NN is not Active and the operation is
not allowed, it will failover to the second NN. So here comes the problem, some
of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes,
setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED
operations, and when executing these commands in the DFSAdmin command line,
they will be sent to a definite NN, no matter it is Active or Standby. This may
result in two problems:
a. If the first tried NN is standby, and the operation takes effect only on
Standby NN, which is not the expected result.
b. If the operation needs to take effect on both NN, but it takes effect on
only one NN. In the future, when there is a NN failover, there may have
problems.

Here I propose the following improvements:
a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL
operations, we should classify it clearly.
b. If the command can not be classified as one of the above four operations, or
if the command needs to take effect on both NN, we should send the request to
both Active and Standby NNs.

2. Refresh protocols: RefreshAuthorizationPolicyProtocol,
RefreshUserMappingsProtocol, RefreshUserMappingsProtocol,
RefreshCallQueueProtocol
Commands in this category, including refreshServiceAcl,
refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and
refreshCallQueue, are implemented by creating a corresponding RPC proxy and
sending the request to remote NN. In the current implementation, these requests
will be sent to a definite NN, no matter it is Active or Standby. Here I
propose that we sent these requests to both NNs.

3. ClientDatanodeProtocol
Commands in this category are handled correctly, no need to improve.

was:
Currently, the commands supported in DFSAdmin can be classified into three
categories according to the protocol used:
# ClientProtocol
Commands in this category generally implement by calling the corresponding
function of the DFSClient class, and will call the corresponding remote
implementation function at the NN side finally. At the NN side, all these
operations are classified into five categories: UNCHECKED, READ, WRITE,
CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only
allows UNCHECKED operations. In the current implementation of DFSClient, it
will connect one NN first, if the first NN is not Active and the operation is
not allowed, it will failover to the second NN. So here comes the problem, some
of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes,
setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED
operations, and when executing these commands in the DFSAdmin command line,
they will be sent to a definite NN, no matter it is Active or Standby. This may
result in two problems:
#* If the first tried NN is standby, and the operation takes effect only on
Standby NN, which is not the expected result.
#* If the operation needs to take effect on both NN, but it takes effect on
only one NN. In the future, when there is a NN failover, there may have
problems.
Here I propose the following improvements:
a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL
operations, we should classify it clearly.
b. If the command can not be classified as one of the above four operations, or
if the command needs to take effect on both NN, we should send the request to
both Active and Standby NNs.

# Refresh protocols: RefreshAuthorizationPolicyProtocol,
RefreshUserMappingsProtocol, RefreshUserMappingsProtocol,
RefreshCallQueueProtocol
Commands in this category, including refreshServiceAcl,
refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and
refreshCallQueue, are implemented by creating a corresponding RPC proxy and
sending the request to remote NN. In the current implementation, these requests
will be sent to a definite NN, no matter it is Active or Standby. Here I
propose that we sent these requests to both NNs.

# ClientDatanodeProtocol
Commands in this category are handled correctly, no need to improve.

> Improve DFSAdmin to support HA cluster better
>

[jira] [Updated] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zesheng Wu updated HDFS-6507:
-

Description:
Currently, the commands supported in DFSAdmin can be classified into three
categories according to the protocol used:
# ClientProtocol
Commands in this category generally implement by calling the corresponding
function of the DFSClient class, and will call the corresponding remote
implementation function at the NN side finally. At the NN side, all these
operations are classified into five categories: UNCHECKED, READ, WRITE,
CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only
allows UNCHECKED operations. In the current implementation of DFSClient, it
will connect one NN first, if the first NN is not Active and the operation is
not allowed, it will failover to the second NN. So here comes the problem, some
of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes,
setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED
operations, and when executing these commands in the DFSAdmin command line,
they will be sent to a definite NN, no matter it is Active or Standby. This may
result in two problems:
#* If the first tried NN is standby, and the operation takes effect only on
Standby NN, which is not the expected result.
#* If the operation needs to take effect on both NN, but it takes effect on
only one NN. In the future, when there is a NN failover, there may have
problems.
Here I propose the following improvements:
a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL
operations, we should classify it clearly.
b. If the command can not be classified as one of the above four operations, or
if the command needs to take effect on both NN, we should send the request to
both Active and Standby NNs.

# ClientDatanodeProtocol
Commands in this category are handled correctly, no need to improve.

was:
Currently, the commands supported in DFSAdmin can be classified into three
categories according to the protocol used:
1.ClientProtocol
Commands in this category generally implement by calling the corresponding
function of the DFSClient class, and will call the corresponding remote
implementation function at the NN side finally. At the NN side, all these
operations are classified into five categories: UNCHECKED, READ, WRITE,
CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only
allows UNCHECKED operations. In the current implementation of DFSClient, it
will connect one NN first, if the first NN is not Active and the operation is
not allowed, it will failover to the second NN. So here comes the problem, some
of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes,
setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED
operations, and when executing these commands in the DFSAdmin command line,
they will be sent to a definite NN, no matter it is Active or Standby. This may
result in two problems:
a. If the first tried NN is standby, and the operation takes effect only on
Standby NN, which is not the expected result.
b. If the operation needs to take effect on both NN, but it takes effect on
only one NN. In the future, when there is a NN failover, there may have
problems.
Here I propose the following improvements:
a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL
operations, we should classify it clearly.
b. If the command can not be classified as one of the above four operations, or
if the command needs to take effect on both NN, we should send the request to
both Active and Standby NNs.

2.Refresh protocols: RefreshAuthorizationPolicyProtocol,
RefreshUserMappingsProtocol, RefreshUserMappingsProtocol,
RefreshCallQueueProtocol
Commands in this category, including refreshServiceAcl,
refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and
refreshCallQueue, are implemented by creating a corresponding RPC proxy and
sending the request to remote NN. In the current implementation, these requests
will be sent to a definite NN, no matter it is Active or Standby. Here I
propose that we sent these requests to both NNs.

3.ClientDatanodeProtocol
Commands in this category are handled correctly, no need to improve.

> Improve DFSAdmin to support HA cluster better
> ---

[jira] [Created] (HDFS-6507) Improve DFSAdmin to support HA cluster better

Zesheng Wu created HDFS-6507:


 Summary: Improve DFSAdmin to support HA cluster better
 Key: HDFS-6507
 URL: https://issues.apache.org/jira/browse/HDFS-6507
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu


Currently, the commands supported in DFSAdmin can be classified into three 
categories according to the protocol used:
1.ClientProtocol
Commands in this category generally implement by calling the corresponding 
function of the DFSClient class, and will call the corresponding remote 
implementation function at the NN side finally. At the NN side, all these 
operations are classified into five categories: UNCHECKED, READ, WRITE, 
CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
allows UNCHECKED operations. In the current implementation of DFSClient, it 
will connect one NN first, if the first NN is not Active and the operation is 
not allowed, it will failover to the second NN. So here comes the problem, some 
of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, 
setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED 
operations, and when executing these commands in the DFSAdmin command line, 
they will be sent to a definite NN, no matter it is Active or Standby. This may 
result in two problems: 
a. If the first tried NN is standby, and the operation takes effect only on 
Standby NN, which is not the expected result.
b. If the operation needs to take effect on both NN, but it takes effect on 
only one NN. In the future, when there is a NN failover, there may have 
problems.
Here I propose the following improvements:
a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
operations, we should classify it clearly.
b. If the command can not be classified as one of the above four operations, or 
if the command needs to take effect on both NN, we should send the request to 
both Active and Standby NNs.

2.Refresh protocols: RefreshAuthorizationPolicyProtocol, 
RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
RefreshCallQueueProtocol
Commands in this category, including refreshServiceAcl, 
refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
sending the request to remote NN. In the current implementation, these requests 
will be sent to a definite NN, no matter it is Active or Standby. Here I 
propose that we sent these requests to both NNs.

3.ClientDatanodeProtocol
Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL


[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026047#comment-14026047
 ] 

Zesheng Wu commented on HDFS-6382:
--

Thanks [~cmccabe] for your feedback.
bq. For the MR strategy, it seems like this could be parallelized fairly 
easily. For example, if you have 5 MR tasks, you can calculate the hash of each 
path, and then task 1 can do all the paths that are 0 mod 5, task 2 can do all 
the paths that are 1 mod 5, and so forth. MR also doesn't introduce extra 
dependencies since HDFS and MR are packaged together.
You mean that we scan the whole namespace at first and then split it into 5 
pieces according to hash of the path, why do we just complete the work during 
the first scanning process? If I misunderstand your meaning, please point out.

bq. I don't understand what you mean by "the mapreduce strategy will have 
additional overheads." What overheads are you foreseeing?
Possible overheads: Starting a mapreduce job needs to split the input, start an 
 AppMaster, collect result from random machines (Perhaps 'overheads' is not a 
proper word here)

bq. I don't understand what you mean by this. What will be done automatically?
Here "automatically" means we do not have to rely on external tools, the daemon 
itself can manage the work well.

bq. How are you going to implement HA for the standalone daemon?
Good point. As you suggested, one approach is save the state in HDFS and simply 
restart it when it fails. But managing the state is a complex work, I am 
considering how to simplify this. One possible simpler approach is that we can 
consider that the daemon is stateless and simply restart it when if fails. We 
needn't do checkpoint and just scan from the beginning when it restarts. 
Because we can require that the work the daemon does is idempotent, starting 
from the beginning will be harmless. Possible drawbacks of the later approach 
are that it may waste some time and may delay the work, but they are 
acceptable. 

bq. I don't see a lot of discussion of logging and monitoring in general. How 
is the user going to become aware that a file was deleted because of a TTL? Or 
if there is an error during the delete, how will the user know? 
For the simplicity purpose, in the initial version, we will use logs to record 
which file/directory is deleted by TTL, and errors during the deleting process.

bq. Does this need to be an administrator command?
It doesn't need to be an administrator command, user only can setTtl on 
file/directory that they have write permission, and can getTtl on 
file/directory that they have read permission.

> HDFS File/Directory TTL
> ---
>
> Key: HDFS-6382
> URL: https://issues.apache.org/jira/browse/HDFS-6382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
> Attachments: HDFS-TTL-Design.pdf
>
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6503) Fix typo of DFSAdmin restoreFailedStorage


[ 
https://issues.apache.org/jira/browse/HDFS-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025285#comment-14025285
 ] 

Zesheng Wu commented on HDFS-6503:
--

Just fix typo, no need to add new tests.

> Fix typo of DFSAdmin restoreFailedStorage
> -
>
> Key: HDFS-6503
> URL: https://issues.apache.org/jira/browse/HDFS-6503
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Minor
> Attachments: HDFS-6503.patch
>
>
> Fix typo: restoreFaileStorage should be restoreFailedStorage



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6503) Fix typo of DFSAdmin restoreFailedStorage


 [ 
https://issues.apache.org/jira/browse/HDFS-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6503:
-

Attachment: HDFS-6503.patch

> Fix typo of DFSAdmin restoreFailedStorage
> -
>
> Key: HDFS-6503
> URL: https://issues.apache.org/jira/browse/HDFS-6503
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Minor
> Attachments: HDFS-6503.patch
>
>
> Fix typo: restoreFaileStorage should be restoreFailedStorage



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6503) Fix typo of DFSAdmin restoreFailedStorage


 [ 
https://issues.apache.org/jira/browse/HDFS-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6503:
-

Status: Patch Available  (was: Open)

> Fix typo of DFSAdmin restoreFailedStorage
> -
>
> Key: HDFS-6503
> URL: https://issues.apache.org/jira/browse/HDFS-6503
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Minor
> Attachments: HDFS-6503.patch
>
>
> Fix typo: restoreFaileStorage should be restoreFailedStorage



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6503) Fix typo of DFSAdmin restoreFailedStorage

Zesheng Wu created HDFS-6503:


 Summary: Fix typo of DFSAdmin restoreFailedStorage
 Key: HDFS-6503
 URL: https://issues.apache.org/jira/browse/HDFS-6503
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
Priority: Minor


Fix typo: restoreFaileStorage should be restoreFailedStorage



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6382) HDFS File/Directory TTL