from:"Chris Li \(JIRA\)"

[jira] [Commented] (HDFS-7008) xlator should be closed upon exit from DFSAdmin#genericRefresh()

2014-11-12 Thread Chris Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208739#comment-14208739
 ] 

Chris Li commented on HDFS-7008:


Linking issue

 xlator should be closed upon exit from DFSAdmin#genericRefresh()
 

 Key: HDFS-7008
 URL: https://issues.apache.org/jira/browse/HDFS-7008
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Attachments: HDFS-7008.1.patch


 {code}
 GenericRefreshProtocol xlator =
   new GenericRefreshProtocolClientSideTranslatorPB(proxy);
 // Refresh
 CollectionRefreshResponse responses = xlator.refresh(identifier, args);
 {code}
 GenericRefreshProtocolClientSideTranslatorPB#close() should be called on 
 xlator before return.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

2014-06-11 Thread Chris Li (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028758#comment-14028758
]

Chris Li commented on HDFS-6507:

Yea that's something we discussed: how to handle HA. Current refreshprotos go
only to the active, and HADOOP-10376 requires the user to manually specify each
NN to target, so they could refresh one NN or X number of NNs by running X
number of refresh commands.

In order to make it more convenient to refresh the NN/RM in an HA
configuration, we can add a special option to do so, maybe like `dfsadmin
-refresh allNamenodes key [arg1..argn]` or maybe just `namenode` (and all is
implicit with HA).

As far as the old refreshProtocols, it seems like a good idea, it would be bad
to have the standby NN take over with outdated configs

Improve DFSAdmin to support HA cluster better
-

Key: HDFS-6507
URL: https://issues.apache.org/jira/browse/HDFS-6507
Project: Hadoop HDFS
Issue Type: Improvement
Components: tools
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu

Currently, the commands supported in DFSAdmin can be classified into three
categories according to the protocol used:
1. ClientProtocol
Commands in this category generally implement by calling the corresponding
function of the DFSClient class, and will call the corresponding remote
implementation function at the NN side finally. At the NN side, all these
operations are classified into five categories: UNCHECKED, READ, WRITE,
CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only
allows UNCHECKED operations. In the current implementation of DFSClient, it
will connect one NN first, if the first NN is not Active and the operation is
not allowed, it will failover to the second NN. So here comes the problem,
some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage,
refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as
UNCHECKED operations, and when executing these commands in the DFSAdmin
command line, they will be sent to a definite NN, no matter it is Active or
Standby. This may result in two problems:
a. If the first tried NN is standby, and the operation takes effect only on
Standby NN, which is not the expected result.
b. If the operation needs to take effect on both NN, but it takes effect on
only one NN. In the future, when there is a NN failover, there may have
problems.
Here I propose the following improvements:
a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL
operations, we should classify it clearly.
b. If the command can not be classified as one of the above four operations,
or if the command needs to take effect on both NN, we should send the request
to both Active and Standby NNs.
2. Refresh protocols: RefreshAuthorizationPolicyProtocol,
RefreshUserMappingsProtocol, RefreshUserMappingsProtocol,
RefreshCallQueueProtocol
Commands in this category, including refreshServiceAcl,
refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and
refreshCallQueue, are implemented by creating a corresponding RPC proxy and
sending the request to remote NN. In the current implementation, these
requests will be sent to a definite NN, no matter it is Active or Standby.
Here I propose that we sent these requests to both NNs.
3. ClientDatanodeProtocol
Commands in this category are handled correctly, no need to improve.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-3544) Ability to use SimpleRegeratingCode to fix missing blocks

2014-01-14 Thread Chris Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13871357#comment-13871357
 ] 

Chris Li commented on HDFS-3544:


Okay, we're very interested in SRC in our clusters and we will work on this 
feature if we can get some momentum. Either way I'll keep you posted.

 Ability to use SimpleRegeratingCode to fix missing blocks
 -

 Key: HDFS-3544
 URL: https://issues.apache.org/jira/browse/HDFS-3544
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: contrib/raid
Reporter: dhruba borthakur

 ReedSolomon encoding (n, k) has n storage nodes and can tolerate n-k 
 failures. Regenerating a block needs to access k blocks. This is a problem 
 when n and k are large. Instead, we can use simple regenerating codes (n, k, 
 f) that does first does ReedSolomon (n,k) and then does XOR with f stripe 
 size. Then, a single disk failure needs to access only f nodes and f can be 
 very small.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-3544) Ability to use SimpleRegeratingCode to fix missing blocks

2014-01-08 Thread Chris Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866170#comment-13866170
 ] 

Chris Li commented on HDFS-3544:


Any updates on this issue? We're interested in trying this out to save space on 
our cold files.

 Ability to use SimpleRegeratingCode to fix missing blocks
 -

 Key: HDFS-3544
 URL: https://issues.apache.org/jira/browse/HDFS-3544
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: Weiyan Wang

 ReedSolomon encoding (n, k) has n storage nodes and can tolerate n-k 
 failures. Regenerating a block needs to access k blocks. This is a problem 
 when n and k are large. Instead, we can use simple regenerating codes (n, k, 
 f) that does first does ReedSolomon (n,k) and then does XOR with f stripe 
 size. Then, a single disk failure needs to access only f nodes and f can be 
 very small.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5639) rpc scheduler abstraction

2013-12-09 Thread Chris Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843606#comment-13843606
 ] 

Chris Li commented on HDFS-5639:


Something like this will be needed down the road if HADOOP-9640 is adopted; 
I'll open separate jiras for these enhancements when we're ready.

 rpc scheduler abstraction
 -

 Key: HDFS-5639
 URL: https://issues.apache.org/jira/browse/HDFS-5639
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
 Attachments: HDFS-5639-2.patch, HDFS-5639.patch


 We have run into various issues in namenode and hbase w.r.t. rpc handling in 
 multi-tenant clusters. The examples are
 https://issues.apache.org/jira/i#browse/HADOOP-9640
  https://issues.apache.org/jira/i#browse/HBASE-8836
 There are different ideas on how to prioritize rpc requests. It could be 
 based on user id, or whether it is read request or write request, or it could 
 use specific rule like datanode's RPC is more important than client RPC.
 We want to enable people to implement and experiiment different rpc 
 schedulers.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HDFS-7008) xlator should be closed upon exit from DFSAdmin#genericRefresh()

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

[jira] [Commented] (HDFS-3544) Ability to use SimpleRegeratingCode to fix missing blocks

[jira] [Commented] (HDFS-3544) Ability to use SimpleRegeratingCode to fix missing blocks

[jira] [Commented] (HDFS-5639) rpc scheduler abstraction

5 matches

Site Navigation

Mail list logo

Footer information