[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723272#comment-16723272 ] Chen Liang edited comment on HDFS-12943 at 12/17/18 10:45 PM: -- Hi [~brahmareddy], Thanks for testing! The timeout issue seems interesting. To start with, it is expected to see some performance degradation *from CLI*, because CLI initiates a DFSClient every time for each command, a fresh DFSClient has to get status of name nodes every time. But if it is the same DFSClient being reused, this would not be an issue. I have never seen the second-call issue. Here is an output from our cluster (log outpu part omitted), and I think you are right about lowering dfs.ha.tail-edits.period, we had similar numbers here: {code:java} $time hdfs --loglevel debug dfs -Ddfs.client.failover.proxy.provider.***=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider -mkdir /TestsORF1 real0m2.254s user0m3.608s sys 0m0.331s $time hdfs --loglevel debug dfs -Ddfs.client.failover.proxy.provider.***=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider -mkdir /TestsORF2 real0m2.159s user0m3.855s sys 0m0.330s{code} Curious, how many NN you had in the testing? and was there any error from NN logs? was (Author: vagarychen): Hi [~brahmareddy], Thanks for testing! The timeout issue seems interesting. To start with, it is expected to see some performance degradation *from CLI*, because CLI initiates a DFSClient every time for each command, a fresh DFSClient has to get status of name nodes every time. But if it is the same DFSClient being reused, this would not be an issue. I have never seen the second-call issue. Here is an output from our cluster (log outpu part omitted), and I think you are right about lowering dfs.ha.tail-edits.period, we had similar numbers here: {code:java} $time hdfs --loglevel debug dfs -Ddfs.client.failover.proxy.provider.ltx1-unonn01=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider -mkdir /TestsORF1 real0m2.254s user0m3.608s sys 0m0.331s $time hdfs --loglevel debug dfs -Ddfs.client.failover.proxy.provider.ltx1-unonn01=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider -mkdir /TestsORF2 real0m2.159s user0m3.855s sys 0m0.330s{code} ** Curious, how many NN you had in the testing? and was there any error from NN logs? > Consistent Reads from Standby Node > -- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Konstantin Shvachko >Priority: Major > Attachments: ConsistentReadsFromStandbyNode.pdf, > ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, > HDFS-12943-002.patch, TestPlan-ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723295#comment-16723295 ] Erik Krogen edited comment on HDFS-12943 at 12/17/18 7:38 PM: -- {quote}Bytheway My intent was when read/write are combined in single application how much will be impact as it needs switch? {quote} There will only be potential performance impact when switching from writes (sent to Active) to reads (sent to Observer) since the client may need to wait some time for the state on the Observer to catch up. Experience when designing HDFS-13150 indicated that this delay time could be reduced to a few ms when properly tuned, which would make the delay of switching from Active to Observer negligible. See the [design doc|https://issues.apache.org/jira/secure/attachment/12924783/edit-tailing-fast-path-design-v2.pdf], especially Appendix A, for more details. {quote} Just for curiosity,,do we've write benchmarks with and without ORP,as I didn't find from HDFS-14058 and HDFS-14059? {quote} There are some preliminary performance numbers shared in my [earlier comment|https://issues.apache.org/jira/browse/HDFS-12943?focusedCommentId=16297483=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16297483] in this thread. I'm not aware of any good benchmark numbers produced after finishing the feature, maybe [~csun] can provide them? {quote} Tried the following test with and with ORF,Came to know it's(perf impact) based on the tailing edits("dfs.ha.tail-edits.period") which is default 1m.(In tests, it's 100MS).. ... IMO,Configuring less value(like 100ms) for reading ingress edits put load on journalnode till log roll happens(2mins by default),as it's open the stream to read the edits. {quote} I think I now understand the issue that you were facing. To use this feature correctly, in addition to setting {{dfs.ha.tail-edits.in-progress}} to true, you should also set {{dfs.ha.tail-edits.period}} to a small value; in our case I think we use 0 or 1 ms. Your concern about heavier load in the JournalNode would have previously been valid, but with the completion of HDFS-13150 and {{dfs.ha.tail-edits.in-progress}} enabled, the Standby/Observer no longer creates a new stream to tail edits, instead polling for edits via RPC (and thus making use of connection keepalive). This greatly reduces the overheads involved with each iteration of edit tailing, enabling it to be done much more frequently. I created HDFS-14155 to track updating the documentation with this information. {quote}i) Did we try with C/CPP client..? {quote} We haven't developed any support for these clients, no. They should continue to work on clusters with the Observer enabled but will not be able to take advantage of the new functionality. {quote} ii)are we planning separate metrics for observer reads(Client Side),Application like mapred might helpful for job counters? {quote} There's no metrics like this on the client side at this time, we are relying on server-side metrics, but I agree that this could be a useful addition. was (Author: xkrogen): {quote}Bytheway My intent was when read/write are combined in single application how much will be impact as it needs switch? {quote} There will only be potential performance impact when switching from writes (sent to Active) to reads (sent to Observer) since the client may need to wait some time for the state on the Observer to catch up. Experience when designing HDFS-13150 indicated that this delay time could be reduced to a few ms when properly tuned, which would make the delay of switching from Active to Observer negligible. See the [design doc|https://issues.apache.org/jira/secure/attachment/12924783/edit-tailing-fast-path-design-v2.pdf], especially Appendix A, for more details. {quote} Just for curiosity,,do we've write benchmarks with and without ORP,as I didn't find from HDFS-14058 and HDFS-14059? {quote} There are some preliminary performance numbers shared in my [earlier comment|https://issues.apache.org/jira/browse/HDFS-12943?focusedCommentId=16297483=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16297483] in this thread. I'm not aware of any good benchmark numbers produced after finishing the feature, maybe [~csun] can provide them? {quote} Tried the following test with and with ORF,Came to know it's(perf impact) based on the tailing edits("dfs.ha.tail-edits.period") which is default 1m.(In tests, it's 100MS).. ... IMO,Configuring less value(like 100ms) for reading ingress edits put load on journalnode till log roll happens(2mins by default),as it's open the stream to read the edits. {quote} I think I now understand the issue that you were facing. To use this feature correctly, in addition to setting {{dfs.ha.tail-edits.in-progress}} to true, you should also set {{dfs.ha.tail-edits.period}} to a
[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722550#comment-16722550 ] Brahma Reddy Battula edited comment on HDFS-12943 at 12/17/18 2:10 PM: --- {quote}I think when we discuss a "request", we need to differentiate an RPC request originating from a Java application (MapReduce task, etc.) vs. a CLI request. The former will be the vast majority of operations on a typical cluster, so I would argue that optimizing for the performance and efficiency of that usage is much more important. {quote} Agree, I Could have mentioned CLI. But getHAServiceState() call from ORP which taken 2s+ as I mentioned above.Bytheway My intent was when read/write are combined in single application how much will be impact as it needs switch? Just for curiosity,,do we've write benchmarks with and without ORP,as I didn't find from HDFS-14058 and HDFS-14059? {quote}1.Are you running with HDFS-13873? With this patch (only committed yesterday so I doubt you have it) the exception thrown should be more meaningful. {quote} Yes,with latest HDFS-12943 branch. {quote}2.Did you remember to enable in-progress edit log tailing? {quote} Yes,Enabled for three NN's {quote}3.Was this run on an almost completely stagnant cluster (no other writes)? This can make the ANN flush its edits to the JNs less frequently, increasing the lag time between ANN and Observer. {quote} Yes,no other writes. Tried the following test with and with ORF,Came to know it's(perf impact) based on the tailing edits("*dfs.ha.tail-edits.period") which is default 1m.(In tests, it's 100MS)..* {code:java} @Test public void testSimpleRead() throws Exception { long avg=0; long avgL=0; long avgC=0; int num = 100; for (int i = 0; i < num; i++) { Path testPath1 = new Path(testPath, "test1"+i); long startTime=System.currentTimeMillis(); assertTrue(dfs.mkdirs(testPath1, FsPermission.getDefault())); long l = System.currentTimeMillis() - startTime; System.out.println("time TakenL1: "+i+" : "+l); avg = avg+l; assertSentTo(0); long startTime2=System.currentTimeMillis(); dfs.getContentSummary(testPath1); long C = System.currentTimeMillis() - startTime2; System.out.println("time TakengetContentSummary: "+i+" : "+ C); avgC = avgC+C; assertSentTo(2); long startTime1=System.currentTimeMillis(); dfs.getFileStatus(testPath1); long L = System.currentTimeMillis() - startTime1; System.out.println("time TakengetFileStatus: "+i+" : "+ L); avgL = avgL+L; assertSentTo(2); } System.out.println("AVG: mkDir: "+avg/num+" List: "+avgL/num+" Cont: "+avgC/num); }{code} IMO,Configuring less value(like 100ms) for reading ingress edits put load on journalnode till log roll happens(2mins by default),as it's open the stream to read the edits. Apart from the perf i have following queries. i) Did we try with C/CPP client..? ii)are we planning separate metrics for observer reads(Client Side),Application like mapred might helpful for job counters? was (Author: brahmareddy): {quote}I think when we discuss a "request", we need to differentiate an RPC request originating from a Java application (MapReduce task, etc.) vs. a CLI request. The former will be the vast majority of operations on a typical cluster, so I would argue that optimizing for the performance and efficiency of that usage is much more important. {quote} Agree, I Could have mentioned CLI. But getHAServiceState() call from ORP which taken 2s+ as I mentioned above.Bytheway My intent was when read/write are combined in single application how much will be impact as it needs switch? Just for curiosity,,do we've write benchmarks with and without ORP,as I didn't find from HDFS-14058 and HDFS-14059? {quote}1.Are you running with HDFS-13873? With this patch (only committed yesterday so I doubt you have it) the exception thrown should be more meaningful. {quote} Yes,with latest HDFS-12943 branch. {quote}2.Did you remember to enable in-progress edit log tailing? {quote} Yes,Enabled for three NN's {quote}3.Was this run on an almost completely stagnant cluster (no other writes)? This can make the ANN flush its edits to the JNs less frequently, increasing the lag time between ANN and Observer. {quote} Yes,no other writes. Tried the following test with and with ORF,Came to know it's(perf impact) based on the tailing edits("*dfs.ha.tail-edits.period") which is default 1m.(In tests, it's 100MS)..* {code:java} @Test public void testSimpleRead() throws Exception { long avg=0; long avgL=0; long avgC=0; int num = 100; for (int i = 0; i < num; i++) { Path testPath1 = new Path(testPath, "test1"+i); long startTime=System.currentTimeMillis(); assertTrue(dfs.mkdirs(testPath1, FsPermission.getDefault())); long l = System.currentTimeMillis() - startTime; System.out.println("time TakenL1: "+i+" : "+l); avg = avg+l; assertSentTo(0); long
[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721595#comment-16721595 ] Erik Krogen edited comment on HDFS-12943 at 12/14/18 4:50 PM: -- Hey [~brahmareddy], thanks for trying it out and for the detailed feedback! I think when we discuss a "request", we need to differentiate an RPC request originating from a Java application (MapReduce task, etc.) vs. a CLI request. The former will be the vast majority of operations on a typical cluster, so I would argue that optimizing for the performance and efficiency of that usage is much more important. The ObserverReadProxyProvider does have higher startup overheads as it directly polls for the state rather than just blindly trying its request; however, in an application which performs more than a few RPCs, this cost will be easily amortized away. I don't think it's fair to say that "write" performance is degraded simply because {{hdfs dfs -mkdirs}} takes longer; a benchmark running 100+ mkdirs would be a better measure IMO. If CLI performance is important, such clients can continue to use ConfiguredFailoverProxyProvider and communicate with the active directly. The timeout you have shared is interesting. I suspect that it may be caused by the Observer trying to wait for its state to catch up to the stateID requested by your getFileInfo. I have a few questions: # Are you running with HDFS-13873? With this patch (only committed yesterday so I doubt you have it) the exception thrown should be more meaningful. # Did you remember to enable in-progress edit log tailing? # Was this run on an almost completely stagnant cluster (no other writes)? This can make the ANN flush its edits to the JNs less frequently, increasing the lag time between ANN and Observer. was (Author: xkrogen): Hey [~brahmareddy], thanks for trying it out and for the detailed feedback! I think when we discuss a "request", we need to differentiate an RPC request originating from a Java application (MapReduce task, etc.) vs. a CLI request. The former will be the vast majority of operations on a typical cluster, so I would argue that optimizing for the performance and efficiency of that usage is much more important. The ObserverReadProxyProvider does have higher startup overheads as it directly polls for the state rather than just blindly trying its request; however, in an application which performs more than a few RPCs, this cost will be easily amortized away. I don't think it's fair to say that "write" performance is degraded simply because {{hdfs dfs -mkdirs}} takes longer; a benchmark running 100+ mkdirs would be a better measure IMO. The timeout you have shared is interesting. I suspect that it may be caused by the Observer trying to wait for its state to catch up to the stateID requested by your getFileInfo. I have a few questions: # Are you running with HDFS-13873? With this patch (only committed yesterday so I doubt you have it) the exception thrown should be more meaningful. # Did you remember to enable in-progress edit log tailing? # Was this run on an almost completely stagnant cluster (no other writes)? This can make the ANN flush its edits to the JNs less frequently, increasing the lag time between ANN and Observer. > Consistent Reads from Standby Node > -- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Konstantin Shvachko >Priority: Major > Attachments: ConsistentReadsFromStandbyNode.pdf, > ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, > TestPlan-ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720914#comment-16720914 ] Brahma Reddy Battula edited comment on HDFS-12943 at 12/14/18 4:33 AM: --- Thanks all for great work here. I think,write requests can be degraded..? As they also contains some read requests like getFileinfo(),getServerDefaults() ...(getHAServiceState() is newly added) . Just I had checked for mkdir perf,it's like below. * i) getHAServiceState() took 2+ sec ( 3 getHAServiceState() + 2 getFileInfo() + 1 mkdirs = 6 calls) * ii) Every second request is getting timedout[1] and rpc call is getting skipped from observer.( 7 getHAServiceState() + 4 getFileInfo() + 1 mkdirs = 12 calls).Here two getFileInfo() skipped from observer hence it's success with Active. {noformat} time hdfs --loglevel debug dfs -Ddfs.client.failover.proxy.provider.hacluster=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider -mkdir /TestsORF1 real 0m4.314s user 0m3.668s sys 0m0.272s time hdfs --loglevel debug dfs -Ddfs.client.failover.proxy.provider.hacluster=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider -mkdir /TestsORF2 real 0m22.238s user 0m3.800s sys 0m0.248s {noformat} *without ObserverReadProxyProvider ( 2 getFileInfo() + 1 mkdirs() = 3 Calls)* {noformat} time ./hdfs --loglevel debug dfs -mkdir /TestsCFP real 0m2.105s user 0m3.768s sys 0m0.592s {noformat} *Please correct me if I am missing anyting.* timedout[1],Every second write request I am getting following, did I miss something here,these calls are skipped from observer. {noformat} 2018-12-14 11:21:45,312 DEBUG ipc.Client: closing ipc connection to vm1/10.*.*.*:65110: 1 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.*.*.*:58409 remote=vm1/10.*.*.*:65110] java.net.SocketTimeoutException: 1 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.*.*.*:58409 remote=vm1/10.*.*.*:65110] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.FilterInputStream.read(FilterInputStream.java:133) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:567) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1849) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1183) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1079) 2018-12-14 11:21:45,313 DEBUG ipc.Client: IPC Client (1006094903) connection to vm1/10.*.*.*:65110 from brahma: closed{noformat} was (Author: brahmareddy): Thanks all for great work here. I think,write requests can be degraded..? As they also contains some read requests like getFileinfo(),getServerDefaults() ...(getHAServiceState() is newly added) . Just I had checked for mkdir perf,it's like below. * i) getHAServiceState() took 2+ sec ( 3 getHAServiceState() + 2 getFileInfo() + 1 mkdirs = 6 calls) * ii) Every second request is getting timedout[1] and rpc call is getting skipped from observer.( 7 getHAServiceState() + 4 getFileInfo() + 1 mkdirs = 12 calls).Here two getFileInfo() skipped from observer hence it's success with Active. {noformat} time hdfs --loglevel debug dfs -Ddfs.client.failover.proxy.provider.hacluster=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider -mkdir /TestsORF1 real 0m4.314s user 0m3.668s sys 0m0.272s time hdfs --loglevel debug dfs -Ddfs.client.failover.proxy.provider.hacluster=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider -mkdir /TestsORF2 real 0m22.238s user 0m3.800s sys 0m0.248s {noformat} without ObserverReadProxyProvider ( 2 getFileInfo() + 1 mkdirs() = 3 Calls) {noformat} time ./hdfs --loglevel debug dfs -mkdir /TestsCFP real 0m2.105s user 0m3.768s sys 0m0.592s {noformat} *Please correct me if I am missing anyting.* timedout[1],Every second write request I am getting following, did I miss something here,these calls are skipped from observer. {noformat} 2018-12-14 11:21:45,312 DEBUG ipc.Client: closing ipc connection to vm1/10.*.*.*:65110: 1 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.*.*.*:58409 remote=vm1/10.*.*.*:65110] java.net.SocketTimeoutException: 1
[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720914#comment-16720914 ] Brahma Reddy Battula edited comment on HDFS-12943 at 12/14/18 4:33 AM: --- Thanks all for great work here. I think,write requests can be degraded..? As they also contains some read requests like getFileinfo(),getServerDefaults() ...(getHAServiceState() is newly added) . Just I had checked for mkdir perf,it's like below. * i) getHAServiceState() took 2+ sec ( 3 getHAServiceState() + 2 getFileInfo() + 1 mkdirs = 6 calls) * ii) Every second request is getting timedout[1] and rpc call is getting skipped from observer.( 7 getHAServiceState() + 4 getFileInfo() + 1 mkdirs = 12 calls).Here two getFileInfo() skipped from observer hence it's success with Active. {noformat} time hdfs --loglevel debug dfs -Ddfs.client.failover.proxy.provider.hacluster=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider -mkdir /TestsORF1 real 0m4.314s user 0m3.668s sys 0m0.272s time hdfs --loglevel debug dfs -Ddfs.client.failover.proxy.provider.hacluster=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider -mkdir /TestsORF2 real 0m22.238s user 0m3.800s sys 0m0.248s {noformat} *without ObserverReadProxyProvider ( 2 getFileInfo() + 1 mkdirs() = 3 Calls)* {noformat} time ./hdfs --loglevel debug dfs -mkdir /TestsCFP real 0m2.105s user 0m3.768s sys 0m0.592s {noformat} *Please correct me if I am missing anyting.* timedout[1],Every second write request I am getting following, did I miss something here,these calls are skipped from observer. {noformat} 2018-12-14 11:21:45,312 DEBUG ipc.Client: closing ipc connection to vm1/10.*.*.*:65110: 1 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.*.*.*:58409 remote=vm1/10.*.*.*:65110] java.net.SocketTimeoutException: 1 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.*.*.*:58409 remote=vm1/10.*.*.*:65110] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.FilterInputStream.read(FilterInputStream.java:133) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:567) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1849) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1183) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1079) 2018-12-14 11:21:45,313 DEBUG ipc.Client: IPC Client (1006094903) connection to vm1/10.*.*.*:65110 from brahma: closed{noformat} was (Author: brahmareddy): Thanks all for great work here. I think,write requests can be degraded..? As they also contains some read requests like getFileinfo(),getServerDefaults() ...(getHAServiceState() is newly added) . Just I had checked for mkdir perf,it's like below. * i) getHAServiceState() took 2+ sec ( 3 getHAServiceState() + 2 getFileInfo() + 1 mkdirs = 6 calls) * ii) Every second request is getting timedout[1] and rpc call is getting skipped from observer.( 7 getHAServiceState() + 4 getFileInfo() + 1 mkdirs = 12 calls).Here two getFileInfo() skipped from observer hence it's success with Active. {noformat} time hdfs --loglevel debug dfs -Ddfs.client.failover.proxy.provider.hacluster=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider -mkdir /TestsORF1 real 0m4.314s user 0m3.668s sys 0m0.272s time hdfs --loglevel debug dfs -Ddfs.client.failover.proxy.provider.hacluster=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider -mkdir /TestsORF2 real 0m22.238s user 0m3.800s sys 0m0.248s {noformat} *without ObserverReadProxyProvider ( 2 getFileInfo() + 1 mkdirs() = 3 Calls)* {noformat} time ./hdfs --loglevel debug dfs -mkdir /TestsCFP real 0m2.105s user 0m3.768s sys 0m0.592s {noformat} *Please correct me if I am missing anyting.* timedout[1],Every second write request I am getting following, did I miss something here,these calls are skipped from observer. {noformat} 2018-12-14 11:21:45,312 DEBUG ipc.Client: closing ipc connection to vm1/10.*.*.*:65110: 1 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.*.*.*:58409 remote=vm1/10.*.*.*:65110] java.net.SocketTimeoutException: 1
[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689105#comment-16689105 ] xiangheng edited comment on HDFS-12943 at 11/16/18 7:55 AM: Hi,[~csun],I am very glad to communicate this question with you,I have checked HDFS-14067 and make a test,It seems that the problem is still unsolved.If you agree with it,I will propose a new issue and try my best to solve this problem,please let me know if you have any suggestions.thank you very much. was (Author: xiangheng): Hi,[~csun],I am very glad to communicate this question with you,I have checked HDFS-14067 and make a test,It seems that the problem is still unsolved.If you agree with it,I will propose a new issue and devote myself to solve this problem,please let me know if you have any suggestions.thank you very much. > Consistent Reads from Standby Node > -- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Konstantin Shvachko >Priority: Major > Attachments: ConsistentReadsFromStandbyNode.pdf, > ConsistentReadsFromStandbyNode.pdf, > TestPlan-ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688954#comment-16688954 ] xiangheng edited comment on HDFS-12943 at 11/16/18 3:51 AM: Hi,[~csun],I am very glad to communicate this question with you.i have checked the HDFS-14067, It seems that this problem has not been solved,If you agree with it,i will propose a new subtask and devote myself to solve this problem.please let me know if you have any good suggestions,thank you very much. was (Author: xiangheng): Hi,[~csun],I am very glad to communicate this question with you.i have checked the HDFS-14067, It seems that this problem has not been solved,If you agree with it,i will propose a new patch and devote myself to solve this problem.please let me know if you have any good suggestions,thank you very much. > Consistent Reads from Standby Node > -- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Konstantin Shvachko >Priority: Major > Attachments: ConsistentReadsFromStandbyNode.pdf, > ConsistentReadsFromStandbyNode.pdf, > TestPlan-ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683363#comment-16683363 ] xiangheng edited comment on HDFS-12943 at 11/12/18 8:07 AM: Thanks [~csun] ,I configured hdfs-site.xml according to the plan document and used the haadmin command: {{haadmin -transitionToObserver,}}But transition from SBN to Observer state failed,And have a {color:#33}prompt message{color} :{color:#FF}transitionToObserver: incorrect arguments{color},Can you tell me the configuration of the observer namenode related in detail?thank you very much. was (Author: xiangheng): Thanks [~csun] ,I configured hdfs-site.xml according to the plan document and used the haadmin command: {{haadmin -transitionToObserver,}}But transition from SBN to Observer state failed,And have a {color:#33}prompt message{color} :{color:#FF}transitionToObserver: incorrect arguments{color},Can you tell me the configuration of the observer namenode related in detail?thank you very much. > Consistent Reads from Standby Node > -- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Konstantin Shvachko >Priority: Major > Attachments: ConsistentReadsFromStandbyNode.pdf, > ConsistentReadsFromStandbyNode.pdf, > TestPlan-ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16298099#comment-16298099 ] Zhe Zhang edited comment on HDFS-12943 at 12/20/17 8:54 AM: Thanks [~csun], interesting results! You used only 1 SBN to server reads right? In both configurations (with and without stale reads), I assume you were saturating the system? It's interesting to see that with two NNs serving RPCs (1 ANN + 1 SBN), the throughput actually more than doubled the throughput with 1 ANN. Did you use Namesystem unfair locking? If I understand correctly, both your test and the Dynamometer test are more like trace-driven micro benchmarks, where a container issues a certain type of RPC at given timestamp. Chris was probably referring to a test job with "real code" like {{if !file_exists(path) then create_file(path)}}, where the blocking relationship between calls are miniced. [~chris.douglas]: the "natural" increase of write traffic is an interesting question. I don't think the feature will increase the total amount of write RPCs (a given job will still issue that many writes overall). Writes within a job could become more bursty but the job itself will run for shorter. Statistically, the 1000s of jobs on the cluster would probably smooth out this increased burstiness. was (Author: zhz): Thanks [~csun], interesting results! You used only 1 SBN to server reads right? In both configurations (with and without stale reads), I assume you were saturating the system right? It's interesting to see that with two NNs serving RPCs (1 ANN + 1 SBN), the throughput actually more than doubled the throughput with 1 ANN. Did you use Namesystem unfair locking? If I understand correctly, both your test and the Dynamometer test are more like trace-driven micro benchmarks, where a container issues a certain type of RPC at given timestamp. Chris was probably referring to a test job with "real code" like {{if !file_exists(path) then create_file(path)}}, where the blocking relationship between calls are miniced. [~chris.douglas]: the "natural" increase of write traffic is an interesting question. I don't think the feature will increase the total amount of write RPCs (a given job will still issue that many writes overall). Writes within a job could become more bursty but the job itself will run for shorter. Statistically, the 1000s of jobs on the cluster would probably smooth out this increased burstiness. > Consistent Reads from Standby Node > -- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Konstantin Shvachko > Attachments: ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16298099#comment-16298099 ] Zhe Zhang edited comment on HDFS-12943 at 12/20/17 8:54 AM: Thanks [~csun], interesting results! You used only 1 SBN to server reads right? In both configurations (with and without stale reads), I assume you were saturating the system right? It's interesting to see that with two NNs serving RPCs (1 ANN + 1 SBN), the throughput actually more than doubled the throughput with 1 ANN. Did you use Namesystem unfair locking? If I understand correctly, both your test and the Dynamometer test are more like trace-driven micro benchmarks, where a container issues a certain type of RPC at given timestamp. Chris was probably referring to a test job with "real code" like {{if !file_exists(path) then create_file(path)}}, where the blocking relationship between calls are miniced. [~chris.douglas]: the "natural" increase of write traffic is an interesting question. I don't think the feature will increase the total amount of write RPCs (a given job will still issue that many writes overall). Writes within a job could become more bursty but the job itself will run for shorter. Statistically, the 1000s of jobs on the cluster would probably smooth out this increased burstiness. was (Author: zhz): Thanks [~csun], interesting results! If I understand correctly, both your test and the Dynamometer test are more like trace-driven micro benchmarks, where a container issues a certain type of RPC at given timestamp. Chris was probably referring to a test job with "real code" like {{if !file_exists(path) then create_file(path)}}, where the blocking relationship between calls are miniced. [~chris.douglas]: the "natural" increase of write traffic is an interesting question. I don't think the feature will increase the total amount of write RPCs (a given job will still issue that many writes overall). Writes within a job could become more bursty but the job itself will run for shorter. Statistically, the 1000s of jobs on the cluster would probably smooth out this increased burstiness. > Consistent Reads from Standby Node > -- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Konstantin Shvachko > Attachments: ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org