[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723295#comment-16723295 ]
Erik Krogen edited comment on HDFS-12943 at 12/17/18 7:38 PM: -------------------------------------------------------------- {quote}Bytheway My intent was when read/write are combined in single application how much will be impact as it needs switch? {quote} There will only be potential performance impact when switching from writes (sent to Active) to reads (sent to Observer) since the client may need to wait some time for the state on the Observer to catch up. Experience when designing HDFS-13150 indicated that this delay time could be reduced to a few ms when properly tuned, which would make the delay of switching from Active to Observer negligible. See the [design doc|https://issues.apache.org/jira/secure/attachment/12924783/edit-tailing-fast-path-design-v2.pdf], especially Appendix A, for more details. {quote} Just for curiosity,,do we've write benchmarks with and without ORP,as I didn't find from HDFS-14058 and HDFS-14059? {quote} There are some preliminary performance numbers shared in my [earlier comment|https://issues.apache.org/jira/browse/HDFS-12943?focusedCommentId=16297483&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16297483] in this thread. I'm not aware of any good benchmark numbers produced after finishing the feature, maybe [~csun] can provide them? {quote} Tried the following test with and with ORF,Came to know it's(perf impact) based on the tailing edits("dfs.ha.tail-edits.period") which is default 1m.(In tests, it's 100MS).. ... IMO,Configuring less value(like 100ms) for reading ingress edits put load on journalnode till log roll happens(2mins by default),as it's open the stream to read the edits. {quote} I think I now understand the issue that you were facing. To use this feature correctly, in addition to setting {{dfs.ha.tail-edits.in-progress}} to true, you should also set {{dfs.ha.tail-edits.period}} to a small value; in our case I think we use 0 or 1 ms. Your concern about heavier load in the JournalNode would have previously been valid, but with the completion of HDFS-13150 and {{dfs.ha.tail-edits.in-progress}} enabled, the Standby/Observer no longer creates a new stream to tail edits, instead polling for edits via RPC (and thus making use of connection keepalive). This greatly reduces the overheads involved with each iteration of edit tailing, enabling it to be done much more frequently. I created HDFS-14155 to track updating the documentation with this information. {quote}i) Did we try with C/CPP client..? {quote} We haven't developed any support for these clients, no. They should continue to work on clusters with the Observer enabled but will not be able to take advantage of the new functionality. {quote} ii)are we planning separate metrics for observer reads(Client Side),Application like mapred might helpful for job counters? {quote} There's no metrics like this on the client side at this time, we are relying on server-side metrics, but I agree that this could be a useful addition. was (Author: xkrogen): {quote}Bytheway My intent was when read/write are combined in single application how much will be impact as it needs switch? {quote} There will only be potential performance impact when switching from writes (sent to Active) to reads (sent to Observer) since the client may need to wait some time for the state on the Observer to catch up. Experience when designing HDFS-13150 indicated that this delay time could be reduced to a few ms when properly tuned, which would make the delay of switching from Active to Observer negligible. See the [design doc|https://issues.apache.org/jira/secure/attachment/12924783/edit-tailing-fast-path-design-v2.pdf], especially Appendix A, for more details. {quote} Just for curiosity,,do we've write benchmarks with and without ORP,as I didn't find from HDFS-14058 and HDFS-14059? {quote} There are some preliminary performance numbers shared in my [earlier comment|https://issues.apache.org/jira/browse/HDFS-12943?focusedCommentId=16297483&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16297483] in this thread. I'm not aware of any good benchmark numbers produced after finishing the feature, maybe [~csun] can provide them? {quote} Tried the following test with and with ORF,Came to know it's(perf impact) based on the tailing edits("dfs.ha.tail-edits.period") which is default 1m.(In tests, it's 100MS).. ... IMO,Configuring less value(like 100ms) for reading ingress edits put load on journalnode till log roll happens(2mins by default),as it's open the stream to read the edits. {quote} I think I now understand the issue that you were facing. To use this feature correctly, in addition to setting {{dfs.ha.tail-edits.in-progress}} to true, you should also set {{dfs.ha.tail-edits.period}} to a small value; in our case I think we use 0 or 1 ms. Your concern about heavier load in the JournalNode would have previously been valid, but with the completion of HDFS-13150 and {{dfs.ha.tail-edits.in-progress}} enabled, the Standby/Observer no longer create a new stream to tail edits, instead polling for edits via RPC (and thus making us of connection keepalive). This greatly reduces the overheads involved with each iteration of edit tailing, enabling it to be done much more frequently. I created HDFS-14155 to track updating the documentation with this information. {quote}i) Did we try with C/CPP client..? {quote} We haven't developed any support for these clients, no. They should continue to work on clusters with the Observer enabled but will not be able to take advantage of the new functionality. {quote} ii)are we planning separate metrics for observer reads(Client Side),Application like mapred might helpful for job counters? {quote} There's no metrics like this on the client side at this time, we are relying on server-side metrics, but I agree that this could be a useful addition. > Consistent Reads from Standby Node > ---------------------------------- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs > Reporter: Konstantin Shvachko > Priority: Major > Attachments: ConsistentReadsFromStandbyNode.pdf, > ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, > HDFS-12943-002.patch, TestPlan-ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org