[jira] [Comment Edited] (HDFS-13767) Add msync server implementation.

2018-07-31 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564576#comment-16564576
 ] 

Chen Liang edited comment on HDFS-13767 at 8/1/18 12:59 AM:


Post WIP.v004 patch. The main changes are:
 # changed to {{getCorrectLastAppliedOrWrittenTxId}} as Erik suggested.
 # Add a simple unit test and had some minor change to {{TestObserverNode}}.

I was encountering some issues when running test, here are some suspects I had 
on TestObserverNode:

Currently with msync, every single call to Observer needs to catch up state id. 
While when {{ObserverReadProxyProvider}} is created, it makes 
{{reportBadBlocks}} and {{checkAccess}} calls. These calls also need to catch 
up state id. This means every time when  {{ObserverReadProxyProvider}} is 
created there will be a wait... And since every unit test creates a dfs 
cluster, for every single unit test in {{TestObserverNode}}, there will be such 
a wait. I could set the period to a small time, but that introduces other 
issues, because several tests depend on explicitly made edits tailing calls. So 
I set the periods to a smaller, but not too small number...and tests still 
quite some time to run.

I think the right fix would be to have {{ObserverReadProxyProvider}} not make 
those calls on being created. But make a new special call that bypasses the 
catching up (if possible). This is one of those hacky code we are planing to 
optimize in the future. And also, the ongoing work HDFS-13523 may also simplify 
the tests. For now, I tend to believe having these slow tests may be fine.


was (Author: vagarychen):
Post WIP.v004 patch. The main changes are:
 # changed to {{getCorrectLastAppliedOrWrittenTxId}} as Erik suggested.
 # Add a simple unit test and had some minor change to {{TestObserverNode}}.

I was encountering some issues when running test, here are some suspects I had 
on TestObserverNode:

Currently with msync, every single call to Observer needs to catch up state id. 
While when {{ObserverReadProxyProvider}} is created, it makes 
{{reportBadBlocks}} and {{checkAccess}} calls. These calls are also read, so 
they also need to catch up state id. This means every time when  
{{ObserverReadProxyProvider}} is created there will be a wait... And since 
every unit test creates a dfs cluster, for every single unit test in 
{{TestObserverNode}}, there will be such a wait. I could set the period to a 
small time, but that introduces other issues, because several tests depend on 
explicitly made edits tailing calls. So I set the periods to a smaller, but not 
too small number...and tests still quite some time to run.

I think the right fix would be to have {{ObserverReadProxyProvider}} not make 
those read calls on being created. This is one of those hacky code we are 
planing to optimize in the future. And also, the ongoing work HDFS-13523 may 
also simplify the tests. For now, I tend to believe having these slow tests may 
be fine.

> Add msync server implementation.
> 
>
> Key: HDFS-13767
> URL: https://issues.apache.org/jira/browse/HDFS-13767
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13767.WIP.001.patch, HDFS-13767.WIP.002.patch, 
> HDFS-13767.WIP.003.patch, HDFS-13767.WIP.004.patch
>
>
> This is a followup on HDFS-13688, where msync API is introduced to 
> {{ClientProtocol}} but the server side implementation is missing. This is 
> Jira is to implement the server side logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13767) Add msync server implementation.

2018-08-03 Thread Plamen Jeliazkov (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568674#comment-16568674
 ] 

Plamen Jeliazkov edited comment on HDFS-13767 at 8/3/18 7:44 PM:
-

Hey Chen,

Change looks pretty good so far; two things.

(1) I like the idea of splitting AlignmentContext into client and server 
interfaces. I think originally Konstantin wanted just a single interface but I 
think its clear there is a difference. I would not mind picking up that JIRA if 
we want to go ahead with it. This would let us allow the server-side version of 
AlignmentContext to better handle defering / re-queue'ing calls.

(2) A concern about the unit test -- I tried to add a 10 second sleep between 
the thread start and the assert:
{code:java}
assertFalse(readSucceed.get());
{code}
However I found that the test would fail if I did. Which makes me question the 
unit test.
I would expect the 'getFileStatus' to basically hang until the 
_rollEditLogAndTail _call but seems it is updating anyway.
I tried also stopping the _EditLogTailer_ on the ObserverNode but it still 
updated anyway.
Please let me know if I am missing something about this test; I will look 
further.




was (Author: zero45):
Hey Chen,

Change looks pretty good so far; two things.

(1) I like the idea of splitting AlignmentContext into client and server 
interfaces. I think originally Konstantin wanted just a single interface but I 
think its clear there is a difference. I would not mind picking up that JIRA if 
we want to go ahead with it. This would let us allow the server-side version of 
AlignmentContext to better handle defering / re-queue'ing calls.

(2) A concern about the unit test -- I tried to add a 10 second sleep between 
the thread start and the assert:
{code:java}
assertFalse(readSucceed.get());
{code}
However I found that the test would fail if I did. Which makes me question the 
unit test.
I would expect the 'getFileStatus' to basically hang until the 
{{{rollEditLogAndTail}}} call but seems it is updating anyway.
I tried also stopping the {{{EditLogTailer}}} on the ObserverNode but it still 
updated anyway.
Please let me know if I am missing something about this test; I will look 
further.



> Add msync server implementation.
> 
>
> Key: HDFS-13767
> URL: https://issues.apache.org/jira/browse/HDFS-13767
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13767-HDFS-12943.001.patch, 
> HDFS-13767-HDFS-12943.002.patch, HDFS-13767.WIP.001.patch, 
> HDFS-13767.WIP.002.patch, HDFS-13767.WIP.003.patch, HDFS-13767.WIP.004.patch
>
>
> This is a followup on HDFS-13688, where msync API is introduced to 
> {{ClientProtocol}} but the server side implementation is missing. This is 
> Jira is to implement the server side logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13767) Add msync server implementation.

2018-08-03 Thread Plamen Jeliazkov (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568674#comment-16568674
 ] 

Plamen Jeliazkov edited comment on HDFS-13767 at 8/3/18 7:44 PM:
-

Hey Chen,

Change looks pretty good so far; two things.

(1) I like the idea of splitting AlignmentContext into client and server 
interfaces. I think originally Konstantin wanted just a single interface but I 
think its clear there is a difference. I would not mind picking up that JIRA if 
we want to go ahead with it. This would let us allow the server-side version of 
AlignmentContext to better handle defering / re-queue'ing calls.

(2) A concern about the unit test -- I tried to add a 10 second sleep between 
the thread start and the assert:
{code:java}
assertFalse(readSucceed.get());
{code}
However I found that the test would fail if I did. Which makes me question the 
unit test.
I would expect the 'getFileStatus' to basically hang until the 
_rollEditLogAndTail_ call but seems it is updating anyway.
I tried also stopping the _EditLogTailer_ on the ObserverNode but it still 
updated anyway.
Please let me know if I am missing something about this test; I will look 
further.




was (Author: zero45):
Hey Chen,

Change looks pretty good so far; two things.

(1) I like the idea of splitting AlignmentContext into client and server 
interfaces. I think originally Konstantin wanted just a single interface but I 
think its clear there is a difference. I would not mind picking up that JIRA if 
we want to go ahead with it. This would let us allow the server-side version of 
AlignmentContext to better handle defering / re-queue'ing calls.

(2) A concern about the unit test -- I tried to add a 10 second sleep between 
the thread start and the assert:
{code:java}
assertFalse(readSucceed.get());
{code}
However I found that the test would fail if I did. Which makes me question the 
unit test.
I would expect the 'getFileStatus' to basically hang until the 
_rollEditLogAndTail _call but seems it is updating anyway.
I tried also stopping the _EditLogTailer_ on the ObserverNode but it still 
updated anyway.
Please let me know if I am missing something about this test; I will look 
further.



> Add msync server implementation.
> 
>
> Key: HDFS-13767
> URL: https://issues.apache.org/jira/browse/HDFS-13767
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13767-HDFS-12943.001.patch, 
> HDFS-13767-HDFS-12943.002.patch, HDFS-13767.WIP.001.patch, 
> HDFS-13767.WIP.002.patch, HDFS-13767.WIP.003.patch, HDFS-13767.WIP.004.patch
>
>
> This is a followup on HDFS-13688, where msync API is introduced to 
> {{ClientProtocol}} but the server side implementation is missing. This is 
> Jira is to implement the server side logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13767) Add msync server implementation.

2018-08-03 Thread Plamen Jeliazkov (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568674#comment-16568674
 ] 

Plamen Jeliazkov edited comment on HDFS-13767 at 8/3/18 7:56 PM:
-

Hey Chen,

Change looks pretty good so far; couple things.

(1) I like the idea of splitting AlignmentContext into client and server 
interfaces. I think originally Konstantin wanted just a single interface but I 
think its clear there is a difference. I would not mind picking up that JIRA if 
we want to go ahead with it. This would let us allow the server-side version of 
AlignmentContext to better handle defering / re-queue'ing calls.

(2) A concern about the unit test -- I tried to add a 10 second sleep between 
the thread start and the assert:
{code:java}
assertFalse(readSucceed.get());
{code}
However I found that the test would fail if I did. Which makes me question the 
unit test.
I would expect the 'getFileStatus' to basically hang until the 
_rollEditLogAndTail_ call but seems it is updating anyway.
I tried also stopping the _EditLogTailer_ on the ObserverNode but it still 
updated anyway.
Please let me know if I am missing something about this test; I will look 
further.

(3) So far these patches don't actually implement _NameNodeRpcServer.msync()_. 
Should we rename it?

(4) [~xkrogen], [~shv], and others -- I was curios about making msync pause at 
the server level. What if we had msync hang at the client side instead? This 
way we aren't impacting Observer queues with many deferred calls. We could 
simply have the client check up with the Observers' states and so rather than 
re-queue'ing we throw a StandbyException (or something) and force the client to 
maybe pause and retry the msync call on the same or a different Observer. If 
the msync call succeeds then we know we can use the particular Observer that it 
succeeded on. Thoughts?




was (Author: zero45):
Hey Chen,

Change looks pretty good so far; two things.

(1) I like the idea of splitting AlignmentContext into client and server 
interfaces. I think originally Konstantin wanted just a single interface but I 
think its clear there is a difference. I would not mind picking up that JIRA if 
we want to go ahead with it. This would let us allow the server-side version of 
AlignmentContext to better handle defering / re-queue'ing calls.

(2) A concern about the unit test -- I tried to add a 10 second sleep between 
the thread start and the assert:
{code:java}
assertFalse(readSucceed.get());
{code}
However I found that the test would fail if I did. Which makes me question the 
unit test.
I would expect the 'getFileStatus' to basically hang until the 
_rollEditLogAndTail_ call but seems it is updating anyway.
I tried also stopping the _EditLogTailer_ on the ObserverNode but it still 
updated anyway.
Please let me know if I am missing something about this test; I will look 
further.



> Add msync server implementation.
> 
>
> Key: HDFS-13767
> URL: https://issues.apache.org/jira/browse/HDFS-13767
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13767-HDFS-12943.001.patch, 
> HDFS-13767-HDFS-12943.002.patch, HDFS-13767.WIP.001.patch, 
> HDFS-13767.WIP.002.patch, HDFS-13767.WIP.003.patch, HDFS-13767.WIP.004.patch
>
>
> This is a followup on HDFS-13688, where msync API is introduced to 
> {{ClientProtocol}} but the server side implementation is missing. This is 
> Jira is to implement the server side logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13767) Add msync server implementation.

2018-08-03 Thread Plamen Jeliazkov (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568674#comment-16568674
 ] 

Plamen Jeliazkov edited comment on HDFS-13767 at 8/3/18 7:57 PM:
-

Hey Chen,

Change looks pretty good so far; couple things.

(1) I like the idea of splitting AlignmentContext into client and server 
interfaces. I think originally Konstantin wanted just a single interface but I 
think its clear there is a difference. I would not mind picking up that JIRA if 
we want to go ahead with it. This would let us allow the server-side version of 
AlignmentContext to better handle defering / re-queue'ing calls.

(2) A concern about the unit test -- I tried to add a 10 second sleep between 
the thread start and the assert:
{code:java}
assertFalse(readSucceed.get());
{code}
However I found that the test would fail if I did. Which makes me question the 
unit test.
I would expect the 'getFileStatus' to basically hang until the 
_rollEditLogAndTail_ call but seems it is updating anyway.
I tried also stopping the _EditLogTailer_ on the ObserverNode but it still 
updated anyway.
Please let me know if I am missing something about this test; I will look 
further.

(3) So far these patches don't actually implement _NameNodeRpcServer.msync()_. 
Should we rename it?

(4) [~xkrogen], [~shv], and others -- I was curious about making msync pause at 
the server level. What if we had msync hang at the client side instead? This 
way we aren't impacting Observer queues with many deferred calls. We could 
simply have the client check up with the Observers' states and so rather than 
re-queue'ing we throw a StandbyException (or something) and force the client to 
maybe pause and retry the msync call on the same or a different Observer. If 
the msync call succeeds then we know we can use the particular Observer that it 
succeeded on. Thoughts?




was (Author: zero45):
Hey Chen,

Change looks pretty good so far; couple things.

(1) I like the idea of splitting AlignmentContext into client and server 
interfaces. I think originally Konstantin wanted just a single interface but I 
think its clear there is a difference. I would not mind picking up that JIRA if 
we want to go ahead with it. This would let us allow the server-side version of 
AlignmentContext to better handle defering / re-queue'ing calls.

(2) A concern about the unit test -- I tried to add a 10 second sleep between 
the thread start and the assert:
{code:java}
assertFalse(readSucceed.get());
{code}
However I found that the test would fail if I did. Which makes me question the 
unit test.
I would expect the 'getFileStatus' to basically hang until the 
_rollEditLogAndTail_ call but seems it is updating anyway.
I tried also stopping the _EditLogTailer_ on the ObserverNode but it still 
updated anyway.
Please let me know if I am missing something about this test; I will look 
further.

(3) So far these patches don't actually implement _NameNodeRpcServer.msync()_. 
Should we rename it?

(4) [~xkrogen], [~shv], and others -- I was curios about making msync pause at 
the server level. What if we had msync hang at the client side instead? This 
way we aren't impacting Observer queues with many deferred calls. We could 
simply have the client check up with the Observers' states and so rather than 
re-queue'ing we throw a StandbyException (or something) and force the client to 
maybe pause and retry the msync call on the same or a different Observer. If 
the msync call succeeds then we know we can use the particular Observer that it 
succeeded on. Thoughts?



> Add msync server implementation.
> 
>
> Key: HDFS-13767
> URL: https://issues.apache.org/jira/browse/HDFS-13767
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13767-HDFS-12943.001.patch, 
> HDFS-13767-HDFS-12943.002.patch, HDFS-13767.WIP.001.patch, 
> HDFS-13767.WIP.002.patch, HDFS-13767.WIP.003.patch, HDFS-13767.WIP.004.patch
>
>
> This is a followup on HDFS-13688, where msync API is introduced to 
> {{ClientProtocol}} but the server side implementation is missing. This is 
> Jira is to implement the server side logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13767) Add msync server implementation.

2018-08-03 Thread Plamen Jeliazkov (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568674#comment-16568674
 ] 

Plamen Jeliazkov edited comment on HDFS-13767 at 8/3/18 9:22 PM:
-

Hey Chen,

Change looks pretty good so far; couple things.

(1) I like the idea of splitting AlignmentContext into client and server 
interfaces. I think originally Konstantin wanted just a single interface but I 
think its clear there is a difference. I would not mind picking up that JIRA if 
we want to go ahead with it. This would let us allow the server-side version of 
AlignmentContext to better handle defering / re-queue'ing calls.

(2) A concern about the unit test -- I tried to add a 10 second sleep between 
the thread start and the assert:
{code:java}
assertFalse(readSucceed.get());
{code}
However I found that the test would fail if I did. Which makes me question the 
unit test.
I would expect the 'getFileStatus' to basically hang until the 
_rollEditLogAndTail_ call but seems it is updating anyway.
I tried also stopping the _EditLogTailer_ on the ObserverNode but it still 
updated anyway.
Please let me know if I am missing something about this test; I will look 
further.

(3) So far these patches don't actually implement _NameNodeRpcServer.msync()_. 
Should we re-title the JIRA?

(4) [~xkrogen], [~shv], and others -- I was curious about making msync pause at 
the server level. What if we had msync hang at the client side instead? This 
way we aren't impacting Observer queues with many deferred calls. We could 
simply have the client check up with the Observers' states and so rather than 
re-queue'ing we throw a StandbyException (or something) and force the client to 
maybe pause and retry the msync call on the same or a different Observer. If 
the msync call succeeds then we know we can use the particular Observer that it 
succeeded on. Thoughts?




was (Author: zero45):
Hey Chen,

Change looks pretty good so far; couple things.

(1) I like the idea of splitting AlignmentContext into client and server 
interfaces. I think originally Konstantin wanted just a single interface but I 
think its clear there is a difference. I would not mind picking up that JIRA if 
we want to go ahead with it. This would let us allow the server-side version of 
AlignmentContext to better handle defering / re-queue'ing calls.

(2) A concern about the unit test -- I tried to add a 10 second sleep between 
the thread start and the assert:
{code:java}
assertFalse(readSucceed.get());
{code}
However I found that the test would fail if I did. Which makes me question the 
unit test.
I would expect the 'getFileStatus' to basically hang until the 
_rollEditLogAndTail_ call but seems it is updating anyway.
I tried also stopping the _EditLogTailer_ on the ObserverNode but it still 
updated anyway.
Please let me know if I am missing something about this test; I will look 
further.

(3) So far these patches don't actually implement _NameNodeRpcServer.msync()_. 
Should we rename it?

(4) [~xkrogen], [~shv], and others -- I was curious about making msync pause at 
the server level. What if we had msync hang at the client side instead? This 
way we aren't impacting Observer queues with many deferred calls. We could 
simply have the client check up with the Observers' states and so rather than 
re-queue'ing we throw a StandbyException (or something) and force the client to 
maybe pause and retry the msync call on the same or a different Observer. If 
the msync call succeeds then we know we can use the particular Observer that it 
succeeded on. Thoughts?



> Add msync server implementation.
> 
>
> Key: HDFS-13767
> URL: https://issues.apache.org/jira/browse/HDFS-13767
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13767-HDFS-12943.001.patch, 
> HDFS-13767-HDFS-12943.002.patch, HDFS-13767.WIP.001.patch, 
> HDFS-13767.WIP.002.patch, HDFS-13767.WIP.003.patch, HDFS-13767.WIP.004.patch
>
>
> This is a followup on HDFS-13688, where msync API is introduced to 
> {{ClientProtocol}} but the server side implementation is missing. This is 
> Jira is to implement the server side logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13767) Add msync server implementation.

2018-08-03 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568843#comment-16568843
 ] 

Chen Liang edited comment on HDFS-13767 at 8/3/18 10:12 PM:


Thanks for comments [~zero45].

bq. (2) A concern about the unit test...
Thanks for trying this. I think it might be related to the recent update that 
enabled in-progress edit tail. Because if I disable in-progress edit tail for 
this test, then add the delay you mentioned, the test will succeed, but took 
longer (expected). I will need to look deeper to see how in-progress edit tail 
worked with this. [~xkrogen] any fast insights on this?

bq. (3) So far these patches don't actually ...
The only reason there is msync in {{NameNodeRpcServer}} is because msync is 
part of {{ClientProtocol}}, and NameNodeRpcServer implements it. So changing 
this in NameNodeRpcServer won't compile. I guess msync is a little bit 
different from other RPC calls, in that from client side, it's another RPC call 
like others, but from server side, all it's logic is handled in RPC layer, not 
in NN layer at all.


was (Author: vagarychen):
Thanks for comments [~zero45].

bq. (2) A concern about the unit test...
Thanks for trying this. I think it might be related to the recent update that 
enabled in-progress edit tail. Because if I disable in-progress edit tail for 
this test, then add the delay you mentioned, the test will succeed, but took 
longer (expected). I will need to look deeper to see how in-progress edit tail 
worked with this. [~xkrogen] any fast insights on this?

bq. (3) So far these patches don't actually ...
The only reason there is msync in {{NameNodeRpcServer}} is because msync is 
part of {{ClientProtocol}}, and NameNodeRpcServer implements it. So changing 
this in NameNodeRpcServer won't compile. I guess msync is a little bit 
different from other RPC calls, in that from client side, it's another RPC call 
like others, but from server side, all it's logic is handled in RPC layer, not 
in NN layer at all.

bq. (4) I was curious about making msync pause ...
Just some thoughts of mine: it is true that there will be no deferred call in 
this case, but seems to me this will also add retry msync calls, so I don't see 
a particular advantage here in terms of call queue load. Actually it maybe even 
worse: if client times out and retries, the previous call could still be in the 
server call queue? In which case we could have multiple msync call in the 
server call queue, many clients doing this msync retry could fill up the queue 
unnecessarily.  While if the client retry interval is long, it can easily be 
just wasting time. 

> Add msync server implementation.
> 
>
> Key: HDFS-13767
> URL: https://issues.apache.org/jira/browse/HDFS-13767
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13767-HDFS-12943.001.patch, 
> HDFS-13767-HDFS-12943.002.patch, HDFS-13767.WIP.001.patch, 
> HDFS-13767.WIP.002.patch, HDFS-13767.WIP.003.patch, HDFS-13767.WIP.004.patch
>
>
> This is a followup on HDFS-13688, where msync API is introduced to 
> {{ClientProtocol}} but the server side implementation is missing. This is 
> Jira is to implement the server side logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13767) Add msync server implementation.

2018-08-03 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568843#comment-16568843
 ] 

Chen Liang edited comment on HDFS-13767 at 8/3/18 10:21 PM:


Thanks for comments [~zero45].

bq. (2) A concern about the unit test...
Thanks for trying this. I think it might be related to the recent update that 
enabled in-progress edit tail. Because if I disable in-progress edit tail for 
this test, then add the delay you mentioned, the test will succeed, but took 
longer (expected). I will need to look deeper to see how in-progress edit tail 
worked with this. [~xkrogen] any fast insights on this?

bq. (3) So far these patches don't actually ...
The only reason there is msync in {{NameNodeRpcServer}} is because msync is 
part of {{ClientProtocol}}, and NameNodeRpcServer implements it. So changing 
this in NameNodeRpcServer won't compile. I guess msync is a little bit 
different from other RPC calls, in that from client side, it's another RPC call 
like others, but from server side, all it's logic is handled in RPC layer, not 
in NN layer at all.

bq. (4) I was curious about making msync pause ...
Just some thoughts of mine: it is true that there will be no deferred call in 
this case, but seems to me this will also add retry msync calls, so I don't see 
a particular advantage here in terms of call queue load. It seems to me it is 
actually the same as deferring the call, the only difference is whether let the 
client send another call, or just server viewing the existing call as a new one 
(by putting it back to queue).

Actually, maybe we can have a mixed strategy, we did have some discussion 
offline, that if server sees itself too far behind when checking the call, 
don't re-queue, but just throw exception so that client don't bother to wait 
and can try a different Observer. I viewed this as an optimization. 


was (Author: vagarychen):
Thanks for comments [~zero45].

bq. (2) A concern about the unit test...
Thanks for trying this. I think it might be related to the recent update that 
enabled in-progress edit tail. Because if I disable in-progress edit tail for 
this test, then add the delay you mentioned, the test will succeed, but took 
longer (expected). I will need to look deeper to see how in-progress edit tail 
worked with this. [~xkrogen] any fast insights on this?

bq. (3) So far these patches don't actually ...
The only reason there is msync in {{NameNodeRpcServer}} is because msync is 
part of {{ClientProtocol}}, and NameNodeRpcServer implements it. So changing 
this in NameNodeRpcServer won't compile. I guess msync is a little bit 
different from other RPC calls, in that from client side, it's another RPC call 
like others, but from server side, all it's logic is handled in RPC layer, not 
in NN layer at all.

> Add msync server implementation.
> 
>
> Key: HDFS-13767
> URL: https://issues.apache.org/jira/browse/HDFS-13767
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13767-HDFS-12943.001.patch, 
> HDFS-13767-HDFS-12943.002.patch, HDFS-13767.WIP.001.patch, 
> HDFS-13767.WIP.002.patch, HDFS-13767.WIP.003.patch, HDFS-13767.WIP.004.patch
>
>
> This is a followup on HDFS-13688, where msync API is introduced to 
> {{ClientProtocol}} but the server side implementation is missing. This is 
> Jira is to implement the server side logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org