[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-16 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709733#comment-13709733
 ] 

Daryn Sharp commented on HDFS-4942:
---

I think the RPC layer is the best place to transparently make the change, which 
has the nice trait of providing the capability for other project's RPC servers.

Recent RPCv9 changes have already laid the groundwork for multiplexing RPC 
connections.  However, a streamId is not in the header for differentiation of 
multiplexed streams.  The clientId may work nicely in this case if it is 
guaranteed to be unique, at least per-connection.

 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFSRetryCache.pdf


 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 non-idempotent operations such as  create, append, delete, rename etc. This 
 will cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-16 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709972#comment-13709972
 ] 

Konstantin Shvachko commented on HDFS-4942:
---

Using clientId for multiplexing RPC connections is an interesting use case. 
Check the uniqueness guarantees though.

 The current solution will be useful for all the other applications

Is that a hypothetical opportunity or you have any particular use cases in mind 
for Yarn? Would be good to know.

 avoid incompatible RPC changes that effect sub-projects
 I am not sure what you mean by this.

I mean that you are building a retry cache for HDFS and making changes 
incompatible for all other projects (rather than for HDFS only). So I am trying 
to understand what value it brings to others.

 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFSRetryCache.pdf


 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 non-idempotent operations such as  create, append, delete, rename etc. This 
 will cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-16 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710081#comment-13710081
 ] 

Suresh Srinivas commented on HDFS-4942:
---

bq. Check the uniqueness guarantees though.
Every RPC client is uniquely identified. That means all the requests from a 
single client can be identified as belonging to the same session.

bq. Is that a hypothetical opportunity or you have any particular use cases in 
mind for Yarn? Would be good to know.
No. For stateful restarts of RM, this will be used to build retry cache for 
non-idempotent request. Similarly other application masters such mapreduce and 
Tez would make use of them.

bq. I mean that you are building a retry cache for HDFS and making changes 
incompatible for all other projects (rather than for HDFS only). So I am trying 
to understand what value it brings to others.
The above comment answers the value it brings to others. You keep referring to 
incompatible changes. 2.1.0 is incompatible! This is going to to go with 2.1.0. 
Also whether you make only HDFS incompatible or the entire Hadoop incompatible, 
the applications need to deal with it anyway, given pretty much every one up 
the stack uses HDFS.

 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFSRetryCache.pdf


 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 non-idempotent operations such as  create, append, delete, rename etc. This 
 will cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-16 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710193#comment-13710193
 ] 

Konstantin Shvachko commented on HDFS-4942:
---

 For stateful restarts of RM

Sorry I didn't understand what that means.

 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFSRetryCache.pdf


 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 non-idempotent operations such as  create, append, delete, rename etc. This 
 will cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-15 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709496#comment-13709496
 ] 

Suresh Srinivas commented on HDFS-4942:
---

bq. I was thinking about it and I feel that adding the retry-related fields and 
flags in the RPC layer is not the best way.
Most RPC protocols have a requirement to have unique request identifier/Message 
ID/Transaction ID for execute-at-most-once semantics and for implementing retry 
cache. Some protocols started out with simpler xid (ONC-RPC 32 bit number), 
much like callId in hadoop RPC, and migrated to different request identifier 
format since the previous scheme did not guarantee uniqueness. See - 
http://ietf.10.n7.nabble.com/Sessions-RPC-XID-usage-td160272.html.

Initial proposal in HADOOP-9688 was to leave the callId field as is and add a 
new RequestId field, that guarantees uniqueness. Later if you follow that jira, 
we decided for the new schema. So clientId + callId is the unique Request 
Identifier. This is simplified having to generated unique requestId every time.

We are also adding retry count, because it optimizes retry cache look up on the 
server side. We can also provide more metrics around how many requests are 
retried etc. This I think is useful.

bq. Mostly because the retry logic is intended for a few HDFS methods only, 
while the new field and flag will be serialized and de-serialized by everybody 
including DataNodes, Balancers, MapReduce and Yarn.

The current mechanism can be used by Yarn and Yarm related Application Masters. 
In fact others having to rebuild the same mechanism at the application layer 
seems unnecessary to me.

The new field is a 16 byte field. I think the impact of this should be in the 
noise, compared the existing RPC payload, RTT related to RPC request and 
response and time required for handling the RPC request itself. That is why I 
clearly stated in the comment that optimizing a 16 byte field seems unnecessary 
in - 
https://issues.apache.org/jira/browse/HADOOP-9688?focusedCommentId=13698421page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13698421

bq. constrain changes to HDFS only
The current solution will be useful for all the other applications, instead of 
every one building equivalent of client name. To do this, existing APIs need to 
be changed for YARN and YARN applications. In fact for YARN, there is no 
equivalent of DFSClient. Expecting AM developers to get generating unique 
client name right could be error prone if client do not generate unique names.

I would also argue that DFSClient name belongs to the application layer and 
API. It was an identity used for tracking the lease. I see changing that and 
using it for retry cache as not a clean solution.

bq. avoid incompatible RPC changes that effect sub-projects
I am not sure what you mean by this. I plan to commit the bits required for 
protocol compatibility to 2.1.0-beta. As you know 2.1.0-beta is incompatible 
with older release already. Also for the sake argument, lets assume that we 
want to use client name, instead of client Id. That means all the 
non-idempotent method signature needs to change to include clientName as a 
parameter much like create() and append() calls. Those changes are also not 
compatible.

bq. limit serialization overhead to only the methods involved in the retry.
As I said earlier, the cost couple of bytes should be noise, compared to the 
cost of end to end RPC and countless strings generated in logging on the client 
and server side and numerous temporary short lived objects created in hadoop 
daemons.

bq. This will require making clientName unique as many recently advocated for.
From what I understood, this was brought up because the previous mechanism 
required unique client name. Not sure without the context of that change, 
people were advocating for it.

 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFSRetryCache.pdf


 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 non-idempotent operations such as  create, append, delete, rename etc. This 
 will cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA 

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-12 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13707492#comment-13707492
 ] 

Konstantin Shvachko commented on HDFS-4942:
---

I was thinking about it and I feel that adding the retry-related fields and 
flags in the RPC layer is not the best way. Mostly because the retry logic is 
intended for a few HDFS methods only, while the new field and flag will be 
serialized and de-serialized by everybody including DataNodes, Balancers, 
MapReduce and Yarn.

I think a better way would be to use clientName + callId as a key to index 
the retry cache entries. This will 
- constrain changes to HDFS only
- avoid incompatible RPC changes that effect sub-projects
- limit serialization overhead to only the methods involved in the retry.

This will require making clientName unique as many recently advocated for.
Would that sound reasonable?

 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFSRetryCache.pdf


 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 non-idempotent operations such as  create, append, delete, rename etc. This 
 will cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-11 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706406#comment-13706406
 ] 

Konstantin Shvachko commented on HDFS-4942:
---

Suresh, it looks like the issue is rapidly spreading into multiple jiras. I 
counted 8 by now.
If you think it will take more than that should we consider moving this to a 
branch?
Otherwise porting of the feature in other branches becomes challenging as 
cherry picking of multiple patches interleaved with other changes is 
non-trivial.

 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFSRetryCache.pdf


 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 non-idempotent operations such as  create, append, delete, rename etc. This 
 will cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-11 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706469#comment-13706469
 ] 

Suresh Srinivas commented on HDFS-4942:
---

bq. Suresh, it looks like the issue is rapidly spreading into multiple jiras. I 
counted 8 by now.
I have 1 or 2 more jiras to add to this.

The changes are being done in small increments. Each of the part can be done 
independently, without rendering trunk in a non-working state. Hence the work 
is happening in trunk. Doing it in small increments helps in multiple people 
collaborating on this, instead of one person working on a single large, hard to 
review patch. 

bq. Otherwise porting of the feature in other branches becomes challenging as 
cherry picking of multiple patches interleaved with other changes is non-trivial
Sorry I do not understand what the issue is. How does it make porting of the 
feature hard?


 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFSRetryCache.pdf


 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 non-idempotent operations such as  create, append, delete, rename etc. This 
 will cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-03 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698674#comment-13698674
 ] 

Bikas Saha commented on HDFS-4942:
--

Some clarifications would be good
1) User runs create command using CLI. Internally, the command is tried many 
times over RPC. Will these retries have the same UUID and will result in 
response coming from retry cache? I think yes.
2) Client in 1) above tries many times and fails. User retries the CLI command 
again. This will generate a new set of retries. Will these have the same UUID 
as the retries in 1) above. Will they get response from retry cache?

 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFSRetryCache.pdf


 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 non-idempotent operations such as  create, append, delete, rename etc. This 
 will cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-03 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698679#comment-13698679
 ] 

Suresh Srinivas commented on HDFS-4942:
---

UUID remains the same for only retried/retransmitted requests done 
automatically by the RPC layer and not for subsequent attempt a 
user/application makes.


 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFSRetryCache.pdf


 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 non-idempotent operations such as  create, append, delete, rename etc. This 
 will cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-02 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698423#comment-13698423
 ] 

Chris Nauroth commented on HDFS-4942:
-

The proposal looks good, and I'll be interested to see the analysis of the 
individual RPC calls.  Reminder on something that came up in offline 
conversation: it appears that we can change 
{{ClientProtocol#getDataEncryptionKey}} to annotate it as Idempotent.  It 
doesn't appear to mutate state.  If a retry causes creation of multiple keys, 
that shouldn't be a problem.

{quote}
Given that we plan on adding a unique identifier to every RPC request, should 
we get this change done before 2.1.0-beta rc2 is built? This way 2.1.0-beta 
clients can utilize retry cache as well.
{quote}

+1 for this idea.  Adding the UUID now would be a low-risk change.

 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFSRetryCache.pdf


 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 non-idempotent operations such as  create, append, delete, rename etc. This 
 will cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698425#comment-13698425
 ] 

Suresh Srinivas commented on HDFS-4942:
---

I create HADOOP-9688 to add unique request ID to RPC requests. I also have 
posted an early patch.

 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFSRetryCache.pdf


 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 non-idempotent operations such as  create, append, delete, rename etc. This 
 will cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-06-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696376#comment-13696376
 ] 

Bikas Saha commented on HDFS-4942:
--

Suresh, you mean non-idempotent requests like create etc in the description and 
point 1 in your comment?

 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas

 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 idempotent operations such as  create, append, delete, rename etc. This will 
 cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-06-30 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696402#comment-13696402
 ] 

Suresh Srinivas commented on HDFS-4942:
---

[~bikassaha] Thanks for pointing that out. Fixed the description and the 
comment.

 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas

 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 non-idempotent operations such as  create, append, delete, rename etc. This 
 will cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-06-28 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13695837#comment-13695837
 ] 

Suresh Srinivas commented on HDFS-4942:
---

This issue was discussed in HadoopSummit 2013, HDFS Design Lounge. Folks 
participated in the discussion agreed about the need for retry cache. Here are 
some high level decisions:
# Retry cache will be added to the namenode for idempotent operations. An entry 
in retry cache will be retained for configurable period of time. It will track 
non-idempotent requests when they successfully complete.
# To identify a request uniquely, currently RPC call ID is not sufficient. 
Additional identifiers will be added to RPC to uniquely identify requests 
coming from the same client, two clients on the same machine and two clients on 
different machines.

I will post a design by early next week.

 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas

 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 idempotent operations such as  create, append, delete, rename etc. This will 
 cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira