[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node

Erik Krogen (JIRA) Mon, 17 Dec 2018 11:39:19 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723295#comment-16723295
 ]


Erik Krogen edited comment on HDFS-12943 at 12/17/18 7:38 PM:
--------------------------------------------------------------

{quote}Bytheway My intent was when read/write are combined in single 
application how much will be impact as it needs switch?
{quote}
There will only be potential performance impact when switching from writes 
(sent to Active) to reads (sent to Observer) since the client may need to wait 
some time for the state on the Observer to catch up. Experience when designing 
HDFS-13150 indicated that this delay time could be reduced to a few ms when 
properly tuned, which would make the delay of switching from Active to Observer 
negligible. See the [design 
doc|https://issues.apache.org/jira/secure/attachment/12924783/edit-tailing-fast-path-design-v2.pdf],
 especially Appendix A, for more details.

{quote}
Just for curiosity,,do we've write benchmarks with and without ORP,as I didn't 
find from HDFS-14058 and HDFS-14059?
{quote}
There are some preliminary performance numbers shared in my [earlier 
comment|https://issues.apache.org/jira/browse/HDFS-12943?focusedCommentId=16297483&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16297483]
 in this thread. I'm not aware of any good benchmark numbers produced after 
finishing the feature, maybe [~csun] can provide them?

{quote}
Tried the following test with and with ORF,Came to know it's(perf impact) based 
on the tailing edits("dfs.ha.tail-edits.period") which is default 1m.(In tests, 
it's 100MS)..
...
IMO,Configuring less value(like 100ms) for reading ingress edits put load on 
journalnode till log roll happens(2mins by default),as it's open the stream to 
read the edits.
{quote}
I think I now understand the issue that you were facing. To use this feature 
correctly, in addition to setting {{dfs.ha.tail-edits.in-progress}} to true, 
you should also set {{dfs.ha.tail-edits.period}} to a small value; in our case 
I think we use 0 or 1 ms. Your concern about heavier load in the JournalNode 
would have previously been valid, but with the completion of HDFS-13150 and 
{{dfs.ha.tail-edits.in-progress}} enabled, the Standby/Observer no longer 
creates a new stream to tail edits, instead polling for edits via RPC (and thus 
making use of connection keepalive). This greatly reduces the overheads 
involved with each iteration of edit tailing, enabling it to be done much more 
frequently. I created HDFS-14155 to track updating the documentation with this 
information.

{quote}i) Did we try with C/CPP client..?
{quote}
We haven't developed any support for these clients, no. They should continue to 
work on clusters with the Observer enabled but will not be able to take 
advantage of the new functionality.

{quote}
ii)are we planning separate metrics for observer reads(Client Side),Application 
like mapred might helpful for  job counters?
{quote}
There's no metrics like this on the client side at this time, we are relying on 
server-side metrics, but I agree that this could be a useful addition.


was (Author: xkrogen):
{quote}Bytheway My intent was when read/write are combined in single 
application how much will be impact as it needs switch?
{quote}
There will only be potential performance impact when switching from writes 
(sent to Active) to reads (sent to Observer) since the client may need to wait 
some time for the state on the Observer to catch up. Experience when designing 
HDFS-13150 indicated that this delay time could be reduced to a few ms when 
properly tuned, which would make the delay of switching from Active to Observer 
negligible. See the [design 
doc|https://issues.apache.org/jira/secure/attachment/12924783/edit-tailing-fast-path-design-v2.pdf],
 especially Appendix A, for more details.

{quote}
Just for curiosity,,do we've write benchmarks with and without ORP,as I didn't 
find from HDFS-14058 and HDFS-14059?
{quote}
There are some preliminary performance numbers shared in my [earlier 
comment|https://issues.apache.org/jira/browse/HDFS-12943?focusedCommentId=16297483&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16297483]
 in this thread. I'm not aware of any good benchmark numbers produced after 
finishing the feature, maybe [~csun] can provide them?

{quote}
Tried the following test with and with ORF,Came to know it's(perf impact) based 
on the tailing edits("dfs.ha.tail-edits.period") which is default 1m.(In tests, 
it's 100MS)..
...
IMO,Configuring less value(like 100ms) for reading ingress edits put load on 
journalnode till log roll happens(2mins by default),as it's open the stream to 
read the edits.
{quote}
I think I now understand the issue that you were facing. To use this feature 
correctly, in addition to setting {{dfs.ha.tail-edits.in-progress}} to true, 
you should also set {{dfs.ha.tail-edits.period}} to a small value; in our case 
I think we use 0 or 1 ms. Your concern about heavier load in the JournalNode 
would have previously been valid, but with the completion of HDFS-13150 and 
{{dfs.ha.tail-edits.in-progress}} enabled, the Standby/Observer no longer 
create a new stream to tail edits, instead polling for edits via RPC (and thus 
making us of connection keepalive). This greatly reduces the overheads involved 
with each iteration of edit tailing, enabling it to be done much more 
frequently. I created HDFS-14155 to track updating the documentation with this 
information.

{quote}i) Did we try with C/CPP client..?
{quote}
We haven't developed any support for these clients, no. They should continue to 
work on clusters with the Observer enabled but will not be able to take 
advantage of the new functionality.

{quote}
ii)are we planning separate metrics for observer reads(Client Side),Application 
like mapred might helpful for  job counters?
{quote}
There's no metrics like this on the client side at this time, we are relying on 
server-side metrics, but I agree that this could be a useful addition.

> Consistent Reads from Standby Node
> ----------------------------------
>
>                 Key: HDFS-12943
>                 URL: https://issues.apache.org/jira/browse/HDFS-12943
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: hdfs
>            Reporter: Konstantin Shvachko
>            Priority: Major
>         Attachments: ConsistentReadsFromStandbyNode.pdf, 
> ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, 
> HDFS-12943-002.patch, TestPlan-ConsistentReadsFromStandbyNode.pdf
>
>
> StandbyNode in HDFS is a replica of the active NameNode. The states of the 
> NameNodes are coordinated via the journal. It is natural to consider 
> StandbyNode as a read-only replica. As with any replicated distributed system 
> the problem of stale reads should be resolved. Our main goal is to provide 
> reads from standby in a consistent way in order to enable a wide range of 
> existing applications running on top of HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node

Reply via email to