[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node

2018-12-17 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723272#comment-16723272
 ] 

Chen Liang edited comment on HDFS-12943 at 12/17/18 10:45 PM:
--

Hi [~brahmareddy],

Thanks for testing! The timeout issue seems interesting. To start with, it is 
expected to see some performance degradation *from CLI*, because CLI initiates 
a DFSClient every time for each command, a fresh DFSClient has to get status of 
name nodes every time. But if it is the same DFSClient being reused, this would 
not be an issue. I have never seen the second-call issue. Here is an output 
from our cluster (log outpu part omitted), and I think you are right about 
lowering dfs.ha.tail-edits.period, we had similar numbers here:
{code:java}
$time hdfs --loglevel debug dfs 
-Ddfs.client.failover.proxy.provider.***=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
 -mkdir /TestsORF1
real0m2.254s
user0m3.608s
sys 0m0.331s
$time hdfs --loglevel debug dfs 
-Ddfs.client.failover.proxy.provider.***=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
 -mkdir /TestsORF2
real0m2.159s
user0m3.855s
sys 0m0.330s{code}
Curious, how many NN you had in the testing? and was there any error from NN 
logs?


was (Author: vagarychen):
Hi [~brahmareddy],

Thanks for testing! The timeout issue seems interesting. To start with, it is 
expected to see some performance degradation *from CLI*, because CLI initiates 
a DFSClient every time for each command, a fresh DFSClient has to get status of 
name nodes every time. But if it is the same DFSClient being reused, this would 
not be an issue. I have never seen the second-call issue. Here is an output 
from our cluster (log outpu part omitted), and I think you are right about 
lowering dfs.ha.tail-edits.period, we had similar numbers here:
{code:java}
$time hdfs --loglevel debug dfs 
-Ddfs.client.failover.proxy.provider.ltx1-unonn01=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
 -mkdir /TestsORF1
real0m2.254s
user0m3.608s
sys 0m0.331s
$time hdfs --loglevel debug dfs 
-Ddfs.client.failover.proxy.provider.ltx1-unonn01=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
 -mkdir /TestsORF2
real0m2.159s
user0m3.855s
sys 0m0.330s{code}
 ** Curious, how many NN you had in the testing? and was there any error from 
NN logs?

> Consistent Reads from Standby Node
> --
>
> Key: HDFS-12943
> URL: https://issues.apache.org/jira/browse/HDFS-12943
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: ConsistentReadsFromStandbyNode.pdf, 
> ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, 
> HDFS-12943-002.patch, TestPlan-ConsistentReadsFromStandbyNode.pdf
>
>
> StandbyNode in HDFS is a replica of the active NameNode. The states of the 
> NameNodes are coordinated via the journal. It is natural to consider 
> StandbyNode as a read-only replica. As with any replicated distributed system 
> the problem of stale reads should be resolved. Our main goal is to provide 
> reads from standby in a consistent way in order to enable a wide range of 
> existing applications running on top of HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node

2018-12-17 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723295#comment-16723295
 ] 

Erik Krogen edited comment on HDFS-12943 at 12/17/18 7:38 PM:
--

{quote}Bytheway My intent was when read/write are combined in single 
application how much will be impact as it needs switch?
{quote}
There will only be potential performance impact when switching from writes 
(sent to Active) to reads (sent to Observer) since the client may need to wait 
some time for the state on the Observer to catch up. Experience when designing 
HDFS-13150 indicated that this delay time could be reduced to a few ms when 
properly tuned, which would make the delay of switching from Active to Observer 
negligible. See the [design 
doc|https://issues.apache.org/jira/secure/attachment/12924783/edit-tailing-fast-path-design-v2.pdf],
 especially Appendix A, for more details.

{quote}
Just for curiosity,,do we've write benchmarks with and without ORP,as I didn't 
find from HDFS-14058 and HDFS-14059?
{quote}
There are some preliminary performance numbers shared in my [earlier 
comment|https://issues.apache.org/jira/browse/HDFS-12943?focusedCommentId=16297483=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16297483]
 in this thread. I'm not aware of any good benchmark numbers produced after 
finishing the feature, maybe [~csun] can provide them?

{quote}
Tried the following test with and with ORF,Came to know it's(perf impact) based 
on the tailing edits("dfs.ha.tail-edits.period") which is default 1m.(In tests, 
it's 100MS)..
...
IMO,Configuring less value(like 100ms) for reading ingress edits put load on 
journalnode till log roll happens(2mins by default),as it's open the stream to 
read the edits.
{quote}
I think I now understand the issue that you were facing. To use this feature 
correctly, in addition to setting {{dfs.ha.tail-edits.in-progress}} to true, 
you should also set {{dfs.ha.tail-edits.period}} to a small value; in our case 
I think we use 0 or 1 ms. Your concern about heavier load in the JournalNode 
would have previously been valid, but with the completion of HDFS-13150 and 
{{dfs.ha.tail-edits.in-progress}} enabled, the Standby/Observer no longer 
creates a new stream to tail edits, instead polling for edits via RPC (and thus 
making use of connection keepalive). This greatly reduces the overheads 
involved with each iteration of edit tailing, enabling it to be done much more 
frequently. I created HDFS-14155 to track updating the documentation with this 
information.

{quote}i) Did we try with C/CPP client..?
{quote}
We haven't developed any support for these clients, no. They should continue to 
work on clusters with the Observer enabled but will not be able to take 
advantage of the new functionality.

{quote}
ii)are we planning separate metrics for observer reads(Client Side),Application 
like mapred might helpful for  job counters?
{quote}
There's no metrics like this on the client side at this time, we are relying on 
server-side metrics, but I agree that this could be a useful addition.


was (Author: xkrogen):
{quote}Bytheway My intent was when read/write are combined in single 
application how much will be impact as it needs switch?
{quote}
There will only be potential performance impact when switching from writes 
(sent to Active) to reads (sent to Observer) since the client may need to wait 
some time for the state on the Observer to catch up. Experience when designing 
HDFS-13150 indicated that this delay time could be reduced to a few ms when 
properly tuned, which would make the delay of switching from Active to Observer 
negligible. See the [design 
doc|https://issues.apache.org/jira/secure/attachment/12924783/edit-tailing-fast-path-design-v2.pdf],
 especially Appendix A, for more details.

{quote}
Just for curiosity,,do we've write benchmarks with and without ORP,as I didn't 
find from HDFS-14058 and HDFS-14059?
{quote}
There are some preliminary performance numbers shared in my [earlier 
comment|https://issues.apache.org/jira/browse/HDFS-12943?focusedCommentId=16297483=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16297483]
 in this thread. I'm not aware of any good benchmark numbers produced after 
finishing the feature, maybe [~csun] can provide them?

{quote}
Tried the following test with and with ORF,Came to know it's(perf impact) based 
on the tailing edits("dfs.ha.tail-edits.period") which is default 1m.(In tests, 
it's 100MS)..
...
IMO,Configuring less value(like 100ms) for reading ingress edits put load on 
journalnode till log roll happens(2mins by default),as it's open the stream to 
read the edits.
{quote}
I think I now understand the issue that you were facing. To use this feature 
correctly, in addition to setting {{dfs.ha.tail-edits.in-progress}} to true, 
you should also set {{dfs.ha.tail-edits.period}} to a 

[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node

2018-12-17 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722550#comment-16722550
 ] 

Brahma Reddy Battula edited comment on HDFS-12943 at 12/17/18 2:10 PM:
---

{quote}I think when we discuss a "request", we need to differentiate an RPC 
request originating from a Java application (MapReduce task, etc.) vs. a CLI 
request. The former will be the vast majority of operations on a typical 
cluster, so I would argue that optimizing for the performance and efficiency of 
that usage is much more important.
{quote}
Agree, I Could have mentioned CLI. But getHAServiceState() call from ORP which 
taken 2s+ as I mentioned above.Bytheway My intent was when read/write are 
combined in single application how much will be impact as it needs switch?

Just for curiosity,,do we've write benchmarks with and without ORP,as I didn't 
find from HDFS-14058 and HDFS-14059?
{quote}1.Are you running with HDFS-13873? With this patch (only committed 
yesterday so I doubt you have it) the exception thrown should be more 
meaningful.
{quote}
Yes,with latest HDFS-12943 branch.
{quote}2.Did you remember to enable in-progress edit log tailing?
{quote}
Yes,Enabled for three NN's
{quote}3.Was this run on an almost completely stagnant cluster (no other 
writes)? This can make the ANN flush its edits to the JNs less frequently, 
increasing the lag time between ANN and Observer.
{quote}
Yes,no other writes.

 
 Tried the following test with and with ORF,Came to know it's(perf impact) 
based on the tailing edits("*dfs.ha.tail-edits.period") which is default 1m.(In 
tests, it's 100MS)..*
{code:java}
@Test
 public void testSimpleRead() throws Exception {
 long avg=0;
 long avgL=0;
 long avgC=0;
 int num = 100;
 for (int i = 0; i < num; i++) {
 Path testPath1 = new Path(testPath, "test1"+i);
 long startTime=System.currentTimeMillis();
 assertTrue(dfs.mkdirs(testPath1, FsPermission.getDefault()));
 long l = System.currentTimeMillis() - startTime;
 System.out.println("time TakenL1: "+i+" : "+l);
 avg = avg+l;
 assertSentTo(0);
 long startTime2=System.currentTimeMillis();
 dfs.getContentSummary(testPath1);
 long C = System.currentTimeMillis() - startTime2;
 System.out.println("time TakengetContentSummary: "+i+" : "+ C);
 avgC = avgC+C;
 assertSentTo(2);
 long startTime1=System.currentTimeMillis();
 dfs.getFileStatus(testPath1);
 long L = System.currentTimeMillis() - startTime1;
 System.out.println("time TakengetFileStatus: "+i+" : "+ L);
 avgL = avgL+L;
 assertSentTo(2);
}
 System.out.println("AVG: mkDir: "+avg/num+" List: "+avgL/num+" Cont: 
"+avgC/num);
}{code}
IMO,Configuring less value(like 100ms) for reading ingress edits put load on 
journalnode till log roll happens(2mins by default),as it's open the stream to 
read the edits.

Apart from the perf i have following queries.
 i) Did we try with C/CPP client..?
 ii)are we planning separate metrics for observer reads(Client 
Side),Application like mapred might helpful for  job counters?

 


was (Author: brahmareddy):
{quote}I think when we discuss a "request", we need to differentiate an RPC 
request originating from a Java application (MapReduce task, etc.) vs. a CLI 
request. The former will be the vast majority of operations on a typical 
cluster, so I would argue that optimizing for the performance and efficiency of 
that usage is much more important.
{quote}
Agree, I Could have mentioned CLI. But getHAServiceState() call from ORP which 
taken 2s+ as I mentioned above.Bytheway My intent was when read/write are 
combined in single application how much will be impact as it needs switch?

Just for curiosity,,do we've write benchmarks with and without ORP,as I didn't 
find from HDFS-14058 and HDFS-14059?
{quote}1.Are you running with HDFS-13873? With this patch (only committed 
yesterday so I doubt you have it) the exception thrown should be more 
meaningful.
{quote}
Yes,with latest HDFS-12943 branch.
{quote}2.Did you remember to enable in-progress edit log tailing?
{quote}
Yes,Enabled for three NN's
{quote}3.Was this run on an almost completely stagnant cluster (no other 
writes)? This can make the ANN flush its edits to the JNs less frequently, 
increasing the lag time between ANN and Observer.
{quote}
Yes,no other writes.

 
 Tried the following test with and with ORF,Came to know it's(perf impact) 
based on the tailing edits("*dfs.ha.tail-edits.period") which is default 1m.(In 
tests, it's 100MS)..*
{code:java}
@Test
 public void testSimpleRead() throws Exception {
 long avg=0;
 long avgL=0;
 long avgC=0;
 int num = 100;
 for (int i = 0; i < num; i++) {
 Path testPath1 = new Path(testPath, "test1"+i);
 long startTime=System.currentTimeMillis();
 assertTrue(dfs.mkdirs(testPath1, FsPermission.getDefault()));
 long l = System.currentTimeMillis() - startTime;
 System.out.println("time TakenL1: "+i+" : "+l);
 avg = avg+l;
 assertSentTo(0);
 long 

[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node

2018-12-14 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721595#comment-16721595
 ] 

Erik Krogen edited comment on HDFS-12943 at 12/14/18 4:50 PM:
--

Hey [~brahmareddy], thanks for trying it out and for the detailed feedback!

I think when we discuss a "request", we need to differentiate an RPC request 
originating from a Java application (MapReduce task, etc.) vs. a CLI request. 
The former will be the vast majority of operations on a typical cluster, so I 
would argue that optimizing for the performance and efficiency of that usage is 
much more important. The ObserverReadProxyProvider does have higher startup 
overheads as it directly polls for the state rather than just blindly trying 
its request; however, in an application which performs more than a few RPCs, 
this cost will be easily amortized away. I don't think it's fair to say that 
"write" performance is degraded simply because {{hdfs dfs -mkdirs}} takes 
longer; a benchmark running 100+ mkdirs would be a better measure IMO. If CLI 
performance is important, such clients can continue to use 
ConfiguredFailoverProxyProvider and communicate with the active directly.

The timeout you have shared is interesting. I suspect that it may be caused by 
the Observer trying to wait for its state to catch up to the stateID requested 
by your getFileInfo. I have a few questions:
# Are you running with HDFS-13873? With this patch (only committed yesterday so 
I doubt you have it) the exception thrown should be more meaningful.
# Did you remember to enable in-progress edit log tailing?
# Was this run on an almost completely stagnant cluster (no other writes)? This 
can make the ANN flush its edits to the JNs less frequently, increasing the lag 
time between ANN and Observer.


was (Author: xkrogen):
Hey [~brahmareddy], thanks for trying it out and for the detailed feedback!

I think when we discuss a "request", we need to differentiate an RPC request 
originating from a Java application (MapReduce task, etc.) vs. a CLI request. 
The former will be the vast majority of operations on a typical cluster, so I 
would argue that optimizing for the performance and efficiency of that usage is 
much more important. The ObserverReadProxyProvider does have higher startup 
overheads as it directly polls for the state rather than just blindly trying 
its request; however, in an application which performs more than a few RPCs, 
this cost will be easily amortized away. I don't think it's fair to say that 
"write" performance is degraded simply because {{hdfs dfs -mkdirs}} takes 
longer; a benchmark running 100+ mkdirs would be a better measure IMO.

The timeout you have shared is interesting. I suspect that it may be caused by 
the Observer trying to wait for its state to catch up to the stateID requested 
by your getFileInfo. I have a few questions:
# Are you running with HDFS-13873? With this patch (only committed yesterday so 
I doubt you have it) the exception thrown should be more meaningful.
# Did you remember to enable in-progress edit log tailing?
# Was this run on an almost completely stagnant cluster (no other writes)? This 
can make the ANN flush its edits to the JNs less frequently, increasing the lag 
time between ANN and Observer.

> Consistent Reads from Standby Node
> --
>
> Key: HDFS-12943
> URL: https://issues.apache.org/jira/browse/HDFS-12943
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: ConsistentReadsFromStandbyNode.pdf, 
> ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, 
> TestPlan-ConsistentReadsFromStandbyNode.pdf
>
>
> StandbyNode in HDFS is a replica of the active NameNode. The states of the 
> NameNodes are coordinated via the journal. It is natural to consider 
> StandbyNode as a read-only replica. As with any replicated distributed system 
> the problem of stale reads should be resolved. Our main goal is to provide 
> reads from standby in a consistent way in order to enable a wide range of 
> existing applications running on top of HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node

2018-12-13 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720914#comment-16720914
 ] 

Brahma Reddy Battula edited comment on HDFS-12943 at 12/14/18 4:33 AM:
---

Thanks all for great work here.

I think,write requests can be degraded..? As they also contains some read 
requests like  getFileinfo(),getServerDefaults() ...(getHAServiceState() is 
newly added) .

Just I had checked for mkdir perf,it's like below.
 * i) getHAServiceState() took 2+ sec ( 3 getHAServiceState() + 2 getFileInfo() 
 + 1 mkdirs = 6 calls)
 * ii) Every second request is getting timedout[1] and rpc call is getting 
skipped from observer.(  7 getHAServiceState() + 4 getFileInfo() + 1 mkdirs = 
12 calls).Here two getFileInfo() skipped from observer hence it's success with 
Active.

 

 
{noformat}
time hdfs --loglevel debug dfs 
-Ddfs.client.failover.proxy.provider.hacluster=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
 -mkdir /TestsORF1
real 0m4.314s
user 0m3.668s
sys 0m0.272s
time hdfs --loglevel debug dfs 
-Ddfs.client.failover.proxy.provider.hacluster=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
 -mkdir /TestsORF2
real 0m22.238s
user 0m3.800s
sys 0m0.248s
{noformat}
 

*without ObserverReadProxyProvider ( 2 getFileInfo()  + 1 mkdirs() = 3 Calls)* 
{noformat}
time ./hdfs --loglevel debug dfs  -mkdir /TestsCFP
real 0m2.105s
user 0m3.768s
sys 0m0.592s
{noformat}
*Please correct me if I am missing anyting.*

 

timedout[1],Every second write request I am getting following, did I miss 
something here,these calls are skipped from observer.
{noformat}
2018-12-14 11:21:45,312 DEBUG ipc.Client: closing ipc connection to 
vm1/10.*.*.*:65110: 1 millis timeout while waiting for channel to be ready 
for read. ch : java.nio.channels.SocketChannel[connected local=/10.*.*.*:58409 
remote=vm1/10.*.*.*:65110]
java.net.SocketTimeoutException: 1 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/10.*.*.*:58409 remote=vm1/10.*.*.*:65110]
 at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
 at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
 at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
 at java.io.FilterInputStream.read(FilterInputStream.java:133)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
 at java.io.FilterInputStream.read(FilterInputStream.java:83)
 at java.io.FilterInputStream.read(FilterInputStream.java:83)
 at 
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:567)
 at java.io.DataInputStream.readInt(DataInputStream.java:387)
 at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1849)
 at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1183)
 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1079)
2018-12-14 11:21:45,313 DEBUG ipc.Client: IPC Client (1006094903) connection to 
vm1/10.*.*.*:65110 from brahma: closed{noformat}
 

 

 


was (Author: brahmareddy):
Thanks all for great work here.

I think,write requests can be degraded..? As they also contains some read 
requests like  getFileinfo(),getServerDefaults() ...(getHAServiceState() is 
newly added) .

Just I had checked for mkdir perf,it's like below.
 * i) getHAServiceState() took 2+ sec ( 3 getHAServiceState() + 2 getFileInfo() 
 + 1 mkdirs = 6 calls)
 * ii) Every second request is getting timedout[1] and rpc call is getting 
skipped from observer.(  7 getHAServiceState() + 4 getFileInfo() + 1 mkdirs = 
12 calls).Here two getFileInfo() skipped from observer hence it's success with 
Active.

 

 
{noformat}
time hdfs --loglevel debug dfs 
-Ddfs.client.failover.proxy.provider.hacluster=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
 -mkdir /TestsORF1
real 0m4.314s
user 0m3.668s
sys 0m0.272s
time hdfs --loglevel debug dfs 
-Ddfs.client.failover.proxy.provider.hacluster=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
 -mkdir /TestsORF2
real 0m22.238s
user 0m3.800s
sys 0m0.248s
{noformat}
without ObserverReadProxyProvider ( 2 getFileInfo()  + 1 mkdirs() = 3 Calls)

 
{noformat}
time ./hdfs --loglevel debug dfs  -mkdir /TestsCFP
real 0m2.105s
user 0m3.768s
sys 0m0.592s
{noformat}
*Please correct me if I am missing anyting.*

 

timedout[1],Every second write request I am getting following, did I miss 
something here,these calls are skipped from observer.
{noformat}
2018-12-14 11:21:45,312 DEBUG ipc.Client: closing ipc connection to 
vm1/10.*.*.*:65110: 1 millis timeout while waiting for channel to be ready 
for read. ch : java.nio.channels.SocketChannel[connected local=/10.*.*.*:58409 
remote=vm1/10.*.*.*:65110]
java.net.SocketTimeoutException: 1 

[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node

2018-12-13 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720914#comment-16720914
 ] 

Brahma Reddy Battula edited comment on HDFS-12943 at 12/14/18 4:33 AM:
---

Thanks all for great work here.

I think,write requests can be degraded..? As they also contains some read 
requests like  getFileinfo(),getServerDefaults() ...(getHAServiceState() is 
newly added) .

Just I had checked for mkdir perf,it's like below.
 * i) getHAServiceState() took 2+ sec ( 3 getHAServiceState() + 2 getFileInfo() 
 + 1 mkdirs = 6 calls)
 * ii) Every second request is getting timedout[1] and rpc call is getting 
skipped from observer.(  7 getHAServiceState() + 4 getFileInfo() + 1 mkdirs = 
12 calls).Here two getFileInfo() skipped from observer hence it's success with 
Active. 

{noformat}
time hdfs --loglevel debug dfs 
-Ddfs.client.failover.proxy.provider.hacluster=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
 -mkdir /TestsORF1
real 0m4.314s
user 0m3.668s
sys 0m0.272s
time hdfs --loglevel debug dfs 
-Ddfs.client.failover.proxy.provider.hacluster=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
 -mkdir /TestsORF2
real 0m22.238s
user 0m3.800s
sys 0m0.248s
{noformat}
 

*without ObserverReadProxyProvider ( 2 getFileInfo()  + 1 mkdirs() = 3 Calls)* 
{noformat}
time ./hdfs --loglevel debug dfs  -mkdir /TestsCFP
real 0m2.105s
user 0m3.768s
sys 0m0.592s
{noformat}
*Please correct me if I am missing anyting.*

 

timedout[1],Every second write request I am getting following, did I miss 
something here,these calls are skipped from observer.
{noformat}
2018-12-14 11:21:45,312 DEBUG ipc.Client: closing ipc connection to 
vm1/10.*.*.*:65110: 1 millis timeout while waiting for channel to be ready 
for read. ch : java.nio.channels.SocketChannel[connected local=/10.*.*.*:58409 
remote=vm1/10.*.*.*:65110]
java.net.SocketTimeoutException: 1 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/10.*.*.*:58409 remote=vm1/10.*.*.*:65110]
 at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
 at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
 at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
 at java.io.FilterInputStream.read(FilterInputStream.java:133)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
 at java.io.FilterInputStream.read(FilterInputStream.java:83)
 at java.io.FilterInputStream.read(FilterInputStream.java:83)
 at 
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:567)
 at java.io.DataInputStream.readInt(DataInputStream.java:387)
 at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1849)
 at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1183)
 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1079)
2018-12-14 11:21:45,313 DEBUG ipc.Client: IPC Client (1006094903) connection to 
vm1/10.*.*.*:65110 from brahma: closed{noformat}
 

 

 


was (Author: brahmareddy):
Thanks all for great work here.

I think,write requests can be degraded..? As they also contains some read 
requests like  getFileinfo(),getServerDefaults() ...(getHAServiceState() is 
newly added) .

Just I had checked for mkdir perf,it's like below.
 * i) getHAServiceState() took 2+ sec ( 3 getHAServiceState() + 2 getFileInfo() 
 + 1 mkdirs = 6 calls)
 * ii) Every second request is getting timedout[1] and rpc call is getting 
skipped from observer.(  7 getHAServiceState() + 4 getFileInfo() + 1 mkdirs = 
12 calls).Here two getFileInfo() skipped from observer hence it's success with 
Active.

 

 
{noformat}
time hdfs --loglevel debug dfs 
-Ddfs.client.failover.proxy.provider.hacluster=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
 -mkdir /TestsORF1
real 0m4.314s
user 0m3.668s
sys 0m0.272s
time hdfs --loglevel debug dfs 
-Ddfs.client.failover.proxy.provider.hacluster=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
 -mkdir /TestsORF2
real 0m22.238s
user 0m3.800s
sys 0m0.248s
{noformat}
 

*without ObserverReadProxyProvider ( 2 getFileInfo()  + 1 mkdirs() = 3 Calls)* 
{noformat}
time ./hdfs --loglevel debug dfs  -mkdir /TestsCFP
real 0m2.105s
user 0m3.768s
sys 0m0.592s
{noformat}
*Please correct me if I am missing anyting.*

 

timedout[1],Every second write request I am getting following, did I miss 
something here,these calls are skipped from observer.
{noformat}
2018-12-14 11:21:45,312 DEBUG ipc.Client: closing ipc connection to 
vm1/10.*.*.*:65110: 1 millis timeout while waiting for channel to be ready 
for read. ch : java.nio.channels.SocketChannel[connected local=/10.*.*.*:58409 
remote=vm1/10.*.*.*:65110]
java.net.SocketTimeoutException: 1 

[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node

2018-11-15 Thread xiangheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689105#comment-16689105
 ] 

xiangheng edited comment on HDFS-12943 at 11/16/18 7:55 AM:


Hi,[~csun],I am very glad to communicate this question with you,I have checked 
HDFS-14067 and make a test,It seems that the problem is still unsolved.If you 
agree with it,I will propose a new issue and try my best to solve this 
problem,please let me know if you have any suggestions.thank you very much.


was (Author: xiangheng):
Hi,[~csun],I am very glad to communicate this question with you,I have checked 
HDFS-14067 and make a test,It seems that the problem is still unsolved.If you 
agree with it,I will propose a new issue and devote myself to solve this 
problem,please let me know if you have any suggestions.thank you very much.

> Consistent Reads from Standby Node
> --
>
> Key: HDFS-12943
> URL: https://issues.apache.org/jira/browse/HDFS-12943
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: ConsistentReadsFromStandbyNode.pdf, 
> ConsistentReadsFromStandbyNode.pdf, 
> TestPlan-ConsistentReadsFromStandbyNode.pdf
>
>
> StandbyNode in HDFS is a replica of the active NameNode. The states of the 
> NameNodes are coordinated via the journal. It is natural to consider 
> StandbyNode as a read-only replica. As with any replicated distributed system 
> the problem of stale reads should be resolved. Our main goal is to provide 
> reads from standby in a consistent way in order to enable a wide range of 
> existing applications running on top of HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node

2018-11-15 Thread xiangheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688954#comment-16688954
 ] 

xiangheng edited comment on HDFS-12943 at 11/16/18 3:51 AM:


Hi,[~csun],I am very glad to communicate this question with you.i have checked 
the HDFS-14067, It seems that this problem has not been solved,If you agree 
with it,i will propose a new

subtask and devote myself to solve this problem.please let me know if you have 
any good suggestions,thank you very much.


was (Author: xiangheng):
Hi,[~csun],I am very glad to communicate this question with you.i have checked 
the HDFS-14067, It seems that this problem has not been solved,If you agree 
with it,i will propose a new patch and devote myself to solve this 
problem.please let me know if you have any good suggestions,thank you very much.

> Consistent Reads from Standby Node
> --
>
> Key: HDFS-12943
> URL: https://issues.apache.org/jira/browse/HDFS-12943
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: ConsistentReadsFromStandbyNode.pdf, 
> ConsistentReadsFromStandbyNode.pdf, 
> TestPlan-ConsistentReadsFromStandbyNode.pdf
>
>
> StandbyNode in HDFS is a replica of the active NameNode. The states of the 
> NameNodes are coordinated via the journal. It is natural to consider 
> StandbyNode as a read-only replica. As with any replicated distributed system 
> the problem of stale reads should be resolved. Our main goal is to provide 
> reads from standby in a consistent way in order to enable a wide range of 
> existing applications running on top of HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node

2018-11-12 Thread xiangheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683363#comment-16683363
 ] 

xiangheng edited comment on HDFS-12943 at 11/12/18 8:07 AM:


Thanks [~csun] ,I configured hdfs-site.xml according to the plan document and 
used the haadmin command: {{haadmin -transitionToObserver,}}But transition from 
SBN  to Observer state failed,And have a {color:#33}prompt message{color} 
:{color:#FF}transitionToObserver: incorrect arguments{color},Can you tell 
me the configuration of the observer namenode related in detail?thank you very 
much.


was (Author: xiangheng):
Thanks [~csun] ,I configured hdfs-site.xml according to the plan document and 
used the haadmin command: {{haadmin -transitionToObserver,}}But transition from 
SBN  to Observer state failed,And have a {color:#33}prompt message{color} 
:{color:#FF}transitionToObserver: incorrect arguments{color},Can you tell 
me the configuration of the observer namenode related in detail?thank you very 
much.

> Consistent Reads from Standby Node
> --
>
> Key: HDFS-12943
> URL: https://issues.apache.org/jira/browse/HDFS-12943
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: ConsistentReadsFromStandbyNode.pdf, 
> ConsistentReadsFromStandbyNode.pdf, 
> TestPlan-ConsistentReadsFromStandbyNode.pdf
>
>
> StandbyNode in HDFS is a replica of the active NameNode. The states of the 
> NameNodes are coordinated via the journal. It is natural to consider 
> StandbyNode as a read-only replica. As with any replicated distributed system 
> the problem of stale reads should be resolved. Our main goal is to provide 
> reads from standby in a consistent way in order to enable a wide range of 
> existing applications running on top of HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node

2017-12-20 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16298099#comment-16298099
 ] 

Zhe Zhang edited comment on HDFS-12943 at 12/20/17 8:54 AM:


Thanks [~csun], interesting results! You used only 1 SBN to server reads right? 
In both configurations (with and without stale reads), I assume you were 
saturating the system? It's interesting to see that with two NNs serving RPCs 
(1 ANN + 1 SBN), the throughput actually more than doubled the throughput with 
1 ANN. Did you use Namesystem unfair locking?

If I understand correctly, both your test and the Dynamometer test are more 
like trace-driven micro benchmarks, where a container issues a certain type of 
RPC at given timestamp. Chris was probably referring to a test job with "real 
code" like {{if !file_exists(path) then create_file(path)}}, where the blocking 
relationship between calls are miniced.

[~chris.douglas]: the "natural" increase of write traffic is an interesting 
question. I don't think the feature will increase the total amount of write 
RPCs (a given job will still issue that many writes overall). Writes within a 
job could become more bursty but the job itself will run for shorter. 
Statistically, the 1000s of jobs on the cluster would probably smooth out this 
increased burstiness.


was (Author: zhz):
Thanks [~csun], interesting results! You used only 1 SBN to server reads right? 
In both configurations (with and without stale reads), I assume you were 
saturating the system right? It's interesting to see that with two NNs serving 
RPCs (1 ANN + 1 SBN), the throughput actually more than doubled the throughput 
with 1 ANN. Did you use Namesystem unfair locking?

If I understand correctly, both your test and the Dynamometer test are more 
like trace-driven micro benchmarks, where a container issues a certain type of 
RPC at given timestamp. Chris was probably referring to a test job with "real 
code" like {{if !file_exists(path) then create_file(path)}}, where the blocking 
relationship between calls are miniced.

[~chris.douglas]: the "natural" increase of write traffic is an interesting 
question. I don't think the feature will increase the total amount of write 
RPCs (a given job will still issue that many writes overall). Writes within a 
job could become more bursty but the job itself will run for shorter. 
Statistically, the 1000s of jobs on the cluster would probably smooth out this 
increased burstiness.

> Consistent Reads from Standby Node
> --
>
> Key: HDFS-12943
> URL: https://issues.apache.org/jira/browse/HDFS-12943
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Konstantin Shvachko
> Attachments: ConsistentReadsFromStandbyNode.pdf
>
>
> StandbyNode in HDFS is a replica of the active NameNode. The states of the 
> NameNodes are coordinated via the journal. It is natural to consider 
> StandbyNode as a read-only replica. As with any replicated distributed system 
> the problem of stale reads should be resolved. Our main goal is to provide 
> reads from standby in a consistent way in order to enable a wide range of 
> existing applications running on top of HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node

2017-12-20 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16298099#comment-16298099
 ] 

Zhe Zhang edited comment on HDFS-12943 at 12/20/17 8:54 AM:


Thanks [~csun], interesting results! You used only 1 SBN to server reads right? 
In both configurations (with and without stale reads), I assume you were 
saturating the system right? It's interesting to see that with two NNs serving 
RPCs (1 ANN + 1 SBN), the throughput actually more than doubled the throughput 
with 1 ANN. Did you use Namesystem unfair locking?

If I understand correctly, both your test and the Dynamometer test are more 
like trace-driven micro benchmarks, where a container issues a certain type of 
RPC at given timestamp. Chris was probably referring to a test job with "real 
code" like {{if !file_exists(path) then create_file(path)}}, where the blocking 
relationship between calls are miniced.

[~chris.douglas]: the "natural" increase of write traffic is an interesting 
question. I don't think the feature will increase the total amount of write 
RPCs (a given job will still issue that many writes overall). Writes within a 
job could become more bursty but the job itself will run for shorter. 
Statistically, the 1000s of jobs on the cluster would probably smooth out this 
increased burstiness.


was (Author: zhz):
Thanks [~csun], interesting results!

If I understand correctly, both your test and the Dynamometer test are more 
like trace-driven micro benchmarks, where a container issues a certain type of 
RPC at given timestamp. Chris was probably referring to a test job with "real 
code" like {{if !file_exists(path) then create_file(path)}}, where the blocking 
relationship between calls are miniced.

[~chris.douglas]: the "natural" increase of write traffic is an interesting 
question. I don't think the feature will increase the total amount of write 
RPCs (a given job will still issue that many writes overall). Writes within a 
job could become more bursty but the job itself will run for shorter. 
Statistically, the 1000s of jobs on the cluster would probably smooth out this 
increased burstiness.

> Consistent Reads from Standby Node
> --
>
> Key: HDFS-12943
> URL: https://issues.apache.org/jira/browse/HDFS-12943
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Konstantin Shvachko
> Attachments: ConsistentReadsFromStandbyNode.pdf
>
>
> StandbyNode in HDFS is a replica of the active NameNode. The states of the 
> NameNodes are coordinated via the journal. It is natural to consider 
> StandbyNode as a read-only replica. As with any replicated distributed system 
> the problem of stale reads should be resolved. Our main goal is to provide 
> reads from standby in a consistent way in order to enable a wide range of 
> existing applications running on top of HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org