[jira] [Commented] (HDFS-7008) xlator should be closed upon exit from DFSAdmin#genericRefresh()

2014-11-12 Thread Chris Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208739#comment-14208739
 ] 

Chris Li commented on HDFS-7008:


Linking issue

 xlator should be closed upon exit from DFSAdmin#genericRefresh()
 

 Key: HDFS-7008
 URL: https://issues.apache.org/jira/browse/HDFS-7008
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Attachments: HDFS-7008.1.patch


 {code}
 GenericRefreshProtocol xlator =
   new GenericRefreshProtocolClientSideTranslatorPB(proxy);
 // Refresh
 CollectionRefreshResponse responses = xlator.refresh(identifier, args);
 {code}
 GenericRefreshProtocolClientSideTranslatorPB#close() should be called on 
 xlator before return.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

2014-06-11 Thread Chris Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028758#comment-14028758
 ] 

Chris Li commented on HDFS-6507:


Yea that's something we discussed: how to handle HA. Current refreshprotos go 
only to the active, and HADOOP-10376 requires the user to manually specify each 
NN to target, so they could refresh one NN or X number of NNs by running X 
number of refresh commands. 

In order to make it more convenient to refresh the NN/RM in an HA 
configuration, we can add a special option to do so, maybe like `dfsadmin 
-refresh allNamenodes key [arg1..argn]` or maybe just `namenode` (and all is 
implicit with HA).

As far as the old refreshProtocols, it seems like a good idea, it would be bad 
to have the standby NN take over with outdated configs

 Improve DFSAdmin to support HA cluster better
 -

 Key: HDFS-6507
 URL: https://issues.apache.org/jira/browse/HDFS-6507
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu

 Currently, the commands supported in DFSAdmin can be classified into three 
 categories according to the protocol used:
 1. ClientProtocol
 Commands in this category generally implement by calling the corresponding 
 function of the DFSClient class, and will call the corresponding remote 
 implementation function at the NN side finally. At the NN side, all these 
 operations are classified into five categories: UNCHECKED, READ, WRITE, 
 CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
 allows UNCHECKED operations. In the current implementation of DFSClient, it 
 will connect one NN first, if the first NN is not Active and the operation is 
 not allowed, it will failover to the second NN. So here comes the problem, 
 some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
 refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
 UNCHECKED operations, and when executing these commands in the DFSAdmin 
 command line, they will be sent to a definite NN, no matter it is Active or 
 Standby. This may result in two problems: 
 a. If the first tried NN is standby, and the operation takes effect only on 
 Standby NN, which is not the expected result.
 b. If the operation needs to take effect on both NN, but it takes effect on 
 only one NN. In the future, when there is a NN failover, there may have 
 problems.
 Here I propose the following improvements:
 a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
 operations, we should classify it clearly.
 b. If the command can not be classified as one of the above four operations, 
 or if the command needs to take effect on both NN, we should send the request 
 to both Active and Standby NNs.
 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
 RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
 RefreshCallQueueProtocol
 Commands in this category, including refreshServiceAcl, 
 refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
 refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
 sending the request to remote NN. In the current implementation, these 
 requests will be sent to a definite NN, no matter it is Active or Standby. 
 Here I propose that we sent these requests to both NNs.
 3. ClientDatanodeProtocol
 Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-3544) Ability to use SimpleRegeratingCode to fix missing blocks

2014-01-14 Thread Chris Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13871357#comment-13871357
 ] 

Chris Li commented on HDFS-3544:


Okay, we're very interested in SRC in our clusters and we will work on this 
feature if we can get some momentum. Either way I'll keep you posted.

 Ability to use SimpleRegeratingCode to fix missing blocks
 -

 Key: HDFS-3544
 URL: https://issues.apache.org/jira/browse/HDFS-3544
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: contrib/raid
Reporter: dhruba borthakur

 ReedSolomon encoding (n, k) has n storage nodes and can tolerate n-k 
 failures. Regenerating a block needs to access k blocks. This is a problem 
 when n and k are large. Instead, we can use simple regenerating codes (n, k, 
 f) that does first does ReedSolomon (n,k) and then does XOR with f stripe 
 size. Then, a single disk failure needs to access only f nodes and f can be 
 very small.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-3544) Ability to use SimpleRegeratingCode to fix missing blocks

2014-01-08 Thread Chris Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866170#comment-13866170
 ] 

Chris Li commented on HDFS-3544:


Any updates on this issue? We're interested in trying this out to save space on 
our cold files.

 Ability to use SimpleRegeratingCode to fix missing blocks
 -

 Key: HDFS-3544
 URL: https://issues.apache.org/jira/browse/HDFS-3544
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: Weiyan Wang

 ReedSolomon encoding (n, k) has n storage nodes and can tolerate n-k 
 failures. Regenerating a block needs to access k blocks. This is a problem 
 when n and k are large. Instead, we can use simple regenerating codes (n, k, 
 f) that does first does ReedSolomon (n,k) and then does XOR with f stripe 
 size. Then, a single disk failure needs to access only f nodes and f can be 
 very small.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5639) rpc scheduler abstraction

2013-12-09 Thread Chris Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843606#comment-13843606
 ] 

Chris Li commented on HDFS-5639:


Something like this will be needed down the road if HADOOP-9640 is adopted; 
I'll open separate jiras for these enhancements when we're ready.

 rpc scheduler abstraction
 -

 Key: HDFS-5639
 URL: https://issues.apache.org/jira/browse/HDFS-5639
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
 Attachments: HDFS-5639-2.patch, HDFS-5639.patch


 We have run into various issues in namenode and hbase w.r.t. rpc handling in 
 multi-tenant clusters. The examples are
 https://issues.apache.org/jira/i#browse/HADOOP-9640
  https://issues.apache.org/jira/i#browse/HBASE-8836
 There are different ideas on how to prioritize rpc requests. It could be 
 based on user id, or whether it is read request or write request, or it could 
 use specific rule like datanode's RPC is more important than client RPC.
 We want to enable people to implement and experiiment different rpc 
 schedulers.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)