subject:"\[jira\] \[Commented\] \(HDFS\-4942\) Add retry cache support in Namenode"

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-16 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709733#comment-13709733
]

Daryn Sharp commented on HDFS-4942:
---

I think the RPC layer is the best place to transparently make the change, which
has the nice trait of providing the capability for other project's RPC servers.

Recent RPCv9 changes have already laid the groundwork for multiplexing RPC
connections. However, a streamId is not in the header for differentiation of
multiplexed streams. The clientId may work nicely in this case if it is
guaranteed to be unique, at least per-connection.

Add retry cache support in Namenode
---

Key: HDFS-4942
URL: https://issues.apache.org/jira/browse/HDFS-4942
Project: Hadoop HDFS
Issue Type: Improvement
Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
Attachments: HDFSRetryCache.pdf

In current HA mechanism with FailoverProxyProvider and non HA setups with
RetryProxy retry a request from the RPC layer. If the retried request has
already been processed at the namenode, the subsequent attempts fail for
non-idempotent operations such as create, append, delete, rename etc. This
will cause application failures during HA failover, network issues etc.
This jira proposes adding retry cache at the namenode to handle these
failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-16 Thread Konstantin Shvachko (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709972#comment-13709972
]

Konstantin Shvachko commented on HDFS-4942:
---

Using clientId for multiplexing RPC connections is an interesting use case.
Check the uniqueness guarantees though.

The current solution will be useful for all the other applications

Is that a hypothetical opportunity or you have any particular use cases in mind
for Yarn? Would be good to know.

avoid incompatible RPC changes that effect sub-projects
I am not sure what you mean by this.

I mean that you are building a retry cache for HDFS and making changes
incompatible for all other projects (rather than for HDFS only). So I am trying
to understand what value it brings to others.

Add retry cache support in Namenode
---

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-16 Thread Suresh Srinivas (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710081#comment-13710081
]

Suresh Srinivas commented on HDFS-4942:
---

bq. Check the uniqueness guarantees though.
Every RPC client is uniquely identified. That means all the requests from a
single client can be identified as belonging to the same session.

bq. Is that a hypothetical opportunity or you have any particular use cases in
mind for Yarn? Would be good to know.
No. For stateful restarts of RM, this will be used to build retry cache for
non-idempotent request. Similarly other application masters such mapreduce and
Tez would make use of them.

bq. I mean that you are building a retry cache for HDFS and making changes
incompatible for all other projects (rather than for HDFS only). So I am trying
to understand what value it brings to others.
The above comment answers the value it brings to others. You keep referring to
incompatible changes. 2.1.0 is incompatible! This is going to to go with 2.1.0.
Also whether you make only HDFS incompatible or the entire Hadoop incompatible,
the applications need to deal with it anyway, given pretty much every one up
the stack uses HDFS.

Add retry cache support in Namenode
---

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-16 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710193#comment-13710193
 ] 

Konstantin Shvachko commented on HDFS-4942:
---

 For stateful restarts of RM

Sorry I didn't understand what that means.

 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFSRetryCache.pdf


 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 non-idempotent operations such as  create, append, delete, rename etc. This 
 will cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-15 Thread Suresh Srinivas (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709496#comment-13709496
]

Suresh Srinivas commented on HDFS-4942:
---

bq. I was thinking about it and I feel that adding the retry-related fields and
flags in the RPC layer is not the best way.
Most RPC protocols have a requirement to have unique request identifier/Message
ID/Transaction ID for execute-at-most-once semantics and for implementing retry
cache. Some protocols started out with simpler xid (ONC-RPC 32 bit number),
much like callId in hadoop RPC, and migrated to different request identifier
format since the previous scheme did not guarantee uniqueness. See -
http://ietf.10.n7.nabble.com/Sessions-RPC-XID-usage-td160272.html.

Initial proposal in HADOOP-9688 was to leave the callId field as is and add a
new RequestId field, that guarantees uniqueness. Later if you follow that jira,
we decided for the new schema. So clientId + callId is the unique Request
Identifier. This is simplified having to generated unique requestId every time.

We are also adding retry count, because it optimizes retry cache look up on the
server side. We can also provide more metrics around how many requests are
retried etc. This I think is useful.

bq. Mostly because the retry logic is intended for a few HDFS methods only,
while the new field and flag will be serialized and de-serialized by everybody
including DataNodes, Balancers, MapReduce and Yarn.

The current mechanism can be used by Yarn and Yarm related Application Masters.
In fact others having to rebuild the same mechanism at the application layer
seems unnecessary to me.

The new field is a 16 byte field. I think the impact of this should be in the
noise, compared the existing RPC payload, RTT related to RPC request and
response and time required for handling the RPC request itself. That is why I
clearly stated in the comment that optimizing a 16 byte field seems unnecessary
in -
https://issues.apache.org/jira/browse/HADOOP-9688?focusedCommentId=13698421page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13698421

bq. constrain changes to HDFS only
The current solution will be useful for all the other applications, instead of
every one building equivalent of client name. To do this, existing APIs need to
be changed for YARN and YARN applications. In fact for YARN, there is no
equivalent of DFSClient. Expecting AM developers to get generating unique
client name right could be error prone if client do not generate unique names.

I would also argue that DFSClient name belongs to the application layer and
API. It was an identity used for tracking the lease. I see changing that and
using it for retry cache as not a clean solution.

bq. avoid incompatible RPC changes that effect sub-projects
I am not sure what you mean by this. I plan to commit the bits required for
protocol compatibility to 2.1.0-beta. As you know 2.1.0-beta is incompatible
with older release already. Also for the sake argument, lets assume that we
want to use client name, instead of client Id. That means all the
non-idempotent method signature needs to change to include clientName as a
parameter much like create() and append() calls. Those changes are also not
compatible.

bq. limit serialization overhead to only the methods involved in the retry.
As I said earlier, the cost couple of bytes should be noise, compared to the
cost of end to end RPC and countless strings generated in logging on the client
and server side and numerous temporary short lived objects created in hadoop
daemons.

bq. This will require making clientName unique as many recently advocated for.
From what I understood, this was brought up because the previous mechanism
required unique client name. Not sure without the context of that change,
people were advocating for it.

Add retry cache support in Namenode
---

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-12 Thread Konstantin Shvachko (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13707492#comment-13707492
]

Konstantin Shvachko commented on HDFS-4942:
---

I was thinking about it and I feel that adding the retry-related fields and
flags in the RPC layer is not the best way. Mostly because the retry logic is
intended for a few HDFS methods only, while the new field and flag will be
serialized and de-serialized by everybody including DataNodes, Balancers,
MapReduce and Yarn.

I think a better way would be to use clientName + callId as a key to index
the retry cache entries. This will
- constrain changes to HDFS only
- avoid incompatible RPC changes that effect sub-projects
- limit serialization overhead to only the methods involved in the retry.

This will require making clientName unique as many recently advocated for.
Would that sound reasonable?

Add retry cache support in Namenode
---

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-11 Thread Konstantin Shvachko (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706406#comment-13706406
]

Konstantin Shvachko commented on HDFS-4942:
---

Suresh, it looks like the issue is rapidly spreading into multiple jiras. I
counted 8 by now.
If you think it will take more than that should we consider moving this to a
branch?
Otherwise porting of the feature in other branches becomes challenging as
cherry picking of multiple patches interleaved with other changes is
non-trivial.

Add retry cache support in Namenode
---

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-11 Thread Suresh Srinivas (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706469#comment-13706469
]

Suresh Srinivas commented on HDFS-4942:
---

bq. Suresh, it looks like the issue is rapidly spreading into multiple jiras. I
counted 8 by now.
I have 1 or 2 more jiras to add to this.

The changes are being done in small increments. Each of the part can be done
independently, without rendering trunk in a non-working state. Hence the work
is happening in trunk. Doing it in small increments helps in multiple people
collaborating on this, instead of one person working on a single large, hard to
review patch.

bq. Otherwise porting of the feature in other branches becomes challenging as
cherry picking of multiple patches interleaved with other changes is non-trivial
Sorry I do not understand what the issue is. How does it make porting of the
feature hard?

Add retry cache support in Namenode
---

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-03 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698674#comment-13698674
]

Bikas Saha commented on HDFS-4942:
--

Some clarifications would be good
1) User runs create command using CLI. Internally, the command is tried many
times over RPC. Will these retries have the same UUID and will result in
response coming from retry cache? I think yes.
2) Client in 1) above tries many times and fails. User retries the CLI command
again. This will generate a new set of retries. Will these have the same UUID
as the retries in 1) above. Will they get response from retry cache?

Add retry cache support in Namenode
---

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-03 Thread Suresh Srinivas (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698679#comment-13698679
 ] 

Suresh Srinivas commented on HDFS-4942:
---

UUID remains the same for only retried/retransmitted requests done 
automatically by the RPC layer and not for subsequent attempt a 
user/application makes.


 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFSRetryCache.pdf


 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 non-idempotent operations such as  create, append, delete, rename etc. This 
 will cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-02 Thread Chris Nauroth (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698423#comment-13698423
]

Chris Nauroth commented on HDFS-4942:
-

The proposal looks good, and I'll be interested to see the analysis of the
individual RPC calls. Reminder on something that came up in offline
conversation: it appears that we can change
{{ClientProtocol#getDataEncryptionKey}} to annotate it as Idempotent. It
doesn't appear to mutate state. If a retry causes creation of multiple keys,
that shouldn't be a problem.

{quote}
Given that we plan on adding a unique identifier to every RPC request, should
we get this change done before 2.1.0-beta rc2 is built? This way 2.1.0-beta
clients can utilize retry cache as well.
{quote}

+1 for this idea. Adding the UUID now would be a low-risk change.

Add retry cache support in Namenode
---

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-07-02 Thread Suresh Srinivas (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698425#comment-13698425
 ] 

Suresh Srinivas commented on HDFS-4942:
---

I create HADOOP-9688 to add unique request ID to RPC requests. I also have 
posted an early patch.

 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFSRetryCache.pdf


 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 non-idempotent operations such as  create, append, delete, rename etc. This 
 will cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-06-30 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696376#comment-13696376
 ] 

Bikas Saha commented on HDFS-4942:
--

Suresh, you mean non-idempotent requests like create etc in the description and 
point 1 in your comment?

 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas

 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 idempotent operations such as  create, append, delete, rename etc. This will 
 cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-06-30 Thread Suresh Srinivas (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696402#comment-13696402
 ] 

Suresh Srinivas commented on HDFS-4942:
---

[~bikassaha] Thanks for pointing that out. Fixed the description and the 
comment.

 Add retry cache support in Namenode
 ---

 Key: HDFS-4942
 URL: https://issues.apache.org/jira/browse/HDFS-4942
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas

 In current HA mechanism with FailoverProxyProvider and non HA setups with 
 RetryProxy retry a request from the RPC layer. If the retried request has 
 already been processed at the namenode, the subsequent attempts fail for 
 non-idempotent operations such as  create, append, delete, rename etc. This 
 will cause application failures during HA failover, network issues etc.
 This jira proposes adding retry cache at the namenode to handle these 
 failures. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

2013-06-28 Thread Suresh Srinivas (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13695837#comment-13695837
]

Suresh Srinivas commented on HDFS-4942:
---

This issue was discussed in HadoopSummit 2013, HDFS Design Lounge. Folks
participated in the discussion agreed about the need for retry cache. Here are
some high level decisions:
# Retry cache will be added to the namenode for idempotent operations. An entry
in retry cache will be retained for configurable period of time. It will track
non-idempotent requests when they successfully complete.
# To identify a request uniquely, currently RPC call ID is not sufficient.
Additional identifiers will be added to RPC to uniquely identify requests
coming from the same client, two clients on the same machine and two clients on
different machines.

I will post a design by early next week.

Add retry cache support in Namenode
---

Key: HDFS-4942
URL: https://issues.apache.org/jira/browse/HDFS-4942
Project: Hadoop HDFS
Issue Type: Improvement
Components: ha, namenode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas

In current HA mechanism with FailoverProxyProvider and non HA setups with
RetryProxy retry a request from the RPC layer. If the retried request has
already been processed at the namenode, the subsequent attempts fail for
idempotent operations such as create, append, delete, rename etc. This will
cause application failures during HA failover, network issues etc.
This jira proposes adding retry cache at the namenode to handle these
failures. More details in the comments.

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

[jira] [Commented] (HDFS-4942) Add retry cache support in Namenode

15 matches

Site Navigation

Mail list logo

Footer information