subject:"\\\[jira\\\] \\\[Commented\\\] \\\(HDFS\\\-3192\\\) Active NN should exit when it has not received a getServiceStatus\\\(\\\) rpc from ZKFC for timeout secs"

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-07 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249181#comment-13249181
 ] 

Todd Lipcon commented on HDFS-3192:
---

The state diagram is included in the design doc attached to HDFS-2185. Please 
comment with an example scenario in which you think there is an incorrect 
behavior - I don't know of any aside from HADOOP-8217, but if you know of some 
I'd be really happy to address them rather than find out about them from a 
broken customer :)

 Active NN should exit when it has not received a getServiceStatus() rpc from 
 ZKFC for timeout secs
 --

 Key: HDFS-3192
 URL: https://issues.apache.org/jira/browse/HDFS-3192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-06 Thread Hari Mankude (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249099#comment-13249099
 ] 

Hari Mankude commented on HDFS-3192:


Would it be possible to post state transition diagram for the failover 
controller and its interactions with NN? There are concerns about the 
correctness of situations where zkfc2 is directing nn1 to change states and 
vice versa. 

 Active NN should exit when it has not received a getServiceStatus() rpc from 
 ZKFC for timeout secs
 --

 Key: HDFS-3192
 URL: https://issues.apache.org/jira/browse/HDFS-3192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-04 Thread Suresh Srinivas (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246059#comment-13246059
 ] 

Suresh Srinivas commented on HDFS-3192:
---

Hari, agree with Aaron that this should not be a subtask of HDFS-3092.

 Active NN should exit when it has not received a getServiceStatus() rpc from 
 ZKFC for timeout secs
 --

 Key: HDFS-3192
 URL: https://issues.apache.org/jira/browse/HDFS-3192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-04 Thread Hari Mankude (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246661#comment-13246661
 ] 

Hari Mankude commented on HDFS-3192:


bq.Why?

I think it's an advantage that the FC may die and come back, or that you may 
start the FCs after the NNs.

Well, if FC restarts the health monitoring within the timeout period, then NN 
will not die. However, if FC is having a gc pause or is not restarting, then NN 
should die. This is the first level of protection where in if NN is healthy, it 
can stonith itself.



 Active NN should exit when it has not received a getServiceStatus() rpc from 
 ZKFC for timeout secs
 --

 Key: HDFS-3192
 URL: https://issues.apache.org/jira/browse/HDFS-3192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-04 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246671#comment-13246671
 ] 

Todd Lipcon commented on HDFS-3192:
---

Why add multiple stonith paths, given we need external stonith anyway? It just 
adds to the complexity by increasing the number of scenarios we have to debug, 
etc.

That is to say: if the ZKFC dies, then it will lose its lock, and the other 
node will stonith this one when it takes over. What's the benefit of having it 
abort itself at the same time? In fact, it seems to be detrimental, because if 
it stays up, the other node can do a graceful transitionToStandby() call rather 
than having to do something more drastic like a full abort.

 Active NN should exit when it has not received a getServiceStatus() rpc from 
 ZKFC for timeout secs
 --

 Key: HDFS-3192
 URL: https://issues.apache.org/jira/browse/HDFS-3192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-04 Thread Hari Mankude (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246687#comment-13246687
]

Hari Mankude commented on HDFS-3192:

bq. Why add multiple stonith paths, given we need external stonith anyway? It
just adds to the complexity by increasing the number of scenarios we have to
debug, etc.

I thought we are not going to have external stonith using special devices and
that is mainly the reason why we are going through hoops to implement fencing
in journal daemons.

bq. That is to say: if the ZKFC dies, then it will lose its lock, and the other
node will stonith this one when it takes over. What's the benefit of having it
abort itself at the same time? In fact, it seems to be detrimental, because if
it stays up, the other node can do a graceful transitionToStandby() call rather
than having to do something more drastic like a full abort.

I disagree about two items here.

1. Why is the behaviour different from what happens when zkfc loses the
ephemeral node? Currently zkfc when it loses the ephemeral node will shutdown
the active NN. Similarly if active NN does not hear from zkfc, it implies that
zkfc is dead, going through gc pause essentially resulting in loss of ephemeral
node.

2. If active NN loses quorum, it has to shutdown. There is no way to do a
transitionToStandby() especially since the log is updated after NN metadata is
updated and there is no way to roll back the last update. This is just one of
the issues that we are aware of where a rollback would be necessary. There
might be other situations where rollback is required. In fact, one of the most
of the difficult APIs to implement correctly would be transitionToStandby()
from active state.

Active NN should exit when it has not received a getServiceStatus() rpc from
ZKFC for timeout secs
--

Key: HDFS-3192
URL: https://issues.apache.org/jira/browse/HDFS-3192
Project: Hadoop HDFS
Issue Type: Sub-task
Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-04 Thread Todd Lipcon (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246718#comment-13246718
]

Todd Lipcon commented on HDFS-3192:
---

bq. I thought we are not going to have external stonith using special devices
and that is mainly the reason why we are going through hoops to implement
fencing in journal daemons.

In the current design, which uses a filer, we *require* external stonith
devices. There is no correct way of doing it without either stonith or storage
fencing.

The proposal with the journal-daemon based fencing is essentailly the same as
storage fencing - just that we do it with our own software storage instead of a
NAS/SAN.

bq. Why is the behaviour different from what happens when zkfc loses the
ephemeral node? Currently zkfc when it loses the ephemeral node will shutdown
the active NN

No, it doesn't - it will transition it to standby. But, as I commented
elsewhere, this is redundant, because the _new_ active is actually going to
fence it anyway before taking over.

bq. Similarly if active NN does not hear from zkfc, it implies that zkfc is
dead, going through gc pause essentially resulting in loss of ephemeral node.

But this can reduce uptime. For example, imagine an administrator accidentally
changes the ACL on zookeeper. This causes both ZKFCs to get an authentication
error and crash at the same time. With your design, both NNs will then commit
suicide. With the existing implementation, the system will continue to run in
its existing state -- i.e no new failovers will occur, but whoever is active
will remain active.

bq. If active NN loses quorum, it has to shutdown

Yes, it has to shut down _before_ it does any edits, or it has to be fenced by
the next active. Notification of session loss is asynchronous. The same is true
of your proposal. In either case it can take arbitrarily long before it
notices that it should not be active. So we still require that the new active
fence it before it becomes active. So, this proposal doesn't solve any problems.

bq. In fact, one of the most of the difficult APIs to implement correctly would
be transitionToStandby() from active state.

We already have that implemented. It syncs any existing edits, and then stops
allowing new ones. We allow failover from one node to another without aborting,
so long as it's graceful. This is perfectly correct. If we need to do a
non-graceful failover, we fence the node by STONITH or by disallowing further
access to the edit logs (which indirectly causes the node to abort, since
logSync() fails).

It seems you're trying to solve problems we've already solved.

Active NN should exit when it has not received a getServiceStatus() rpc from
ZKFC for timeout secs
--

Key: HDFS-3192
URL: https://issues.apache.org/jira/browse/HDFS-3192
Project: Hadoop HDFS
Issue Type: Sub-task
Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-04 Thread Hari Mankude (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246752#comment-13246752
]

Hari Mankude commented on HDFS-3192:

bq. bq. Why is the behaviour different from what happens when zkfc loses the
ephemeral node? Currently zkfc when it loses the ephemeral node will shutdown
the active NN

bq. No, it doesn't - it will transition it to standby. But, as I commented
elsewhere, this is redundant, because the new active is actually going to fence
it anyway before taking over.

Well this is incorrect behaviour. It does not handle the situation that I
mentioned earlier about requiring rollbacks. The transition to standby will
have result in old active having incorrect in-memory state. The only way around
this is to shutdown the active. The reason is that as soon zkfc on NN1 has lost
the ephemeral znode, it is possible that zkfc on NN2 has taken over the znode
and NN2 has started fencing the journals. There is no way to gracefully
coordinate this with NN1. This would result in NN1 getting quorum loss which in
turn could leave the in-memory state in NN1 in an inconsistent shape. Do you
agree again that in-memory state of NN1 is inconsistent with the editlogs?

bq. Similarly if active NN does not hear from zkfc, it implies that zkfc is
dead, going through gc pause essentially resulting in loss of ephemeral node.

bq. But this can reduce uptime. For example, imagine an administrator
accidentally changes the ACL on zookeeper. This causes both ZKFCs to get an
authentication error and crash at the same time. With your design, both NNs
will then commit suicide. With the existing implementation, the system will
continue to run in its existing state – i.e no new failovers will occur, but
whoever is active will remain active.

Firstly, how often does some change ACLs in zookeeper? Secondly, why is ZKFC
dying when this happens? ZKFC must be more robust than NN. NN is a resource
that is controlled by ZKFC. We should make zkfc more robust to handle zookeeper
acl changes if this is a common occurance.

bq. If active NN loses quorum, it has to shutdown

bq. Yes, it has to shut down before it does any edits, or it has to be fenced
by the next active. Notification of session loss is asynchronous. The same is
true of your proposal. In either case it can take arbitrarily long before it
notices that it should not be active. So we still require that the new active
fence it before it becomes active. So, this proposal doesn't solve any problems.

My proposal was not meant to handle active NN losing quorum. My proposal is
shutdown NN when ZKFC has died or is in a gc pause.

My comment was with regards to earlier comment regarding doing a
transitionToStandby(). Do you agree that active NN has invalid in-memory state
and cannot go through transitionToStandby() when it loses quorum? There seems
to be two solutions.
1. Implement rollback for various types of editlog entries and then do
transitionToStandby() OR
2. Shutdown NN when it loses quorum
Does this sound right?

Active NN should exit when it has not received a getServiceStatus() rpc from
ZKFC for timeout secs
--

Key: HDFS-3192
URL: https://issues.apache.org/jira/browse/HDFS-3192
Project: Hadoop HDFS
Issue Type: Sub-task
Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-04 Thread Todd Lipcon (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246778#comment-13246778
]

Todd Lipcon commented on HDFS-3192:
---

I think there is confusion here over the terminology loses quorum.

I agree completely about the following: if any NN fails to sync its edit logs,
it needs to abort. This is already what it does today - no changes necessary.
If the edit log happens to be implemented using a quorum protocol (HDFS-3092 or
HDFS-3077 for example), then that behavior should be maintained. The
JournalManager implementation needs to throw an exception in response to
logSync(). That will cause the NN to abort.

That's all that's necessary for correctness - an NN won't ack success to a
write unless it successfully syncs it, and will abort rather than rollback,
since we have no rollback capability.

In the above sense, loses quorum really means loses write access to the edit
logs.

If instead you're talking about loses quorum as loses its ZK session, then
no abort is necessary, because it may still be able to write to its edits. So
long as it's getting success back from editLog.logSync(), then the edits are
being persisted. It is the responsibility of the next active to fence access to
the shared edits. It may do so in one of two ways:
1) Edits fencing: ensure that the next write to the edits mechanism throws IOE.
In the case of FileJournalManager on NAS, this is done via an RPC to the NAS
system to fence the given export.
2) STONITH: ensure that the next write fails because power has been yanked from
the machine.

Alternatively, the new active may first try a graceful transition:
3) Gracefully ask the prior active to stop writing. The prior active flushes
anything buffered, successfully syncs, and then enters standby mode.

Notably, self-stonith upon losing the ZK lease is not an option, because it
may take arbitrarily long before it notices. EG:
1) NN1 writing to edits log
2) ZKFC1 loses lease, but doesn't know about it yet
3) ZKFC2 gets lease
4) NN2 becomes active, starts writing logs
5) NN1 writes some edits. World explodes.
6) ZKFC1 gets asynchronous notification from ZK that it lots its session.
Anything you do at this point is _too late_.

Before step 4, NN2 must use a fencing mechanism. *Regardless* of whatever steps
NN1 or ZKFC1 might take in step 6.

Active NN should exit when it has not received a getServiceStatus() rpc from
ZKFC for timeout secs
--

Key: HDFS-3192
URL: https://issues.apache.org/jira/browse/HDFS-3192
Project: Hadoop HDFS
Issue Type: Sub-task
Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-04 Thread Hari Mankude (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246805#comment-13246805
]

Hari Mankude commented on HDFS-3192:

Excellent comment regarding quorum and active aborting when it cannot write to
(n/2 +1) number of editlog entries. I was getting worried that
transitionToStandby() without NN restart was being suggested in this scenario
also. Thanks for clarifying this for me.

Before we get back to ZKFC dead implies NN should die situation, let us
understand a specific scenario here

bq. 1) NN1 writing to edits log
bq. 2) ZKFC1 loses lease, but doesn't know about it yet
bq. 3) ZKFC2 gets lease
bq. 4) NN2 becomes active, starts writing logs
bq. 5) NN1 writes some edits. World explodes.
bq. 6) ZKFC1 gets asynchronous notification from ZK that it lots its session.
Anything you do at this point is too late.

Let us assume that NN1 has no edit logs to write. The reason could be that we
implement an ip failover also and DFSClients now automatically start talking to
NN2 after the ip failover. There are no edit log entries being created at NN1.
So, NN1 stays active and never behaves as a standby.

So in step #6, irrespective of when ZKFC1 gets the notification, ZKFC1 has to
restart NN1. Otherwise, we don't know as to how long NN1 will stay in limbo.

Now coming to the scenario that I was referring to, ZKFC on NN1 is either dead
or is in gc. NN1 could stay immobile similarly. Also, NN1 could resign much
earlier without having go through uncontrolled abort via fencing. It is always
a much stronger correctness argument when NN1 can self-resign with the
information that is available rather than being forced to resign.

Active NN should exit when it has not received a getServiceStatus() rpc from
ZKFC for timeout secs
--

Key: HDFS-3192
URL: https://issues.apache.org/jira/browse/HDFS-3192
Project: Hadoop HDFS
Issue Type: Sub-task
Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-04 Thread Hari Mankude (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246806#comment-13246806
 ] 

Hari Mankude commented on HDFS-3192:


bq.Excellent comment regarding quorum and active aborting when it cannot write 
to (n/2 +1) number of editlog entries.

Should say editlog locations.

 Active NN should exit when it has not received a getServiceStatus() rpc from 
 ZKFC for timeout secs
 --

 Key: HDFS-3192
 URL: https://issues.apache.org/jira/browse/HDFS-3192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-04 Thread Todd Lipcon (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246818#comment-13246818
]

Todd Lipcon commented on HDFS-3192:
---

bq. So in step #6, irrespective of when ZKFC1 gets the notification, ZKFC1 has
to restart NN1. Otherwise, we don't know as to how long NN1 will stay in limbo.

Can you explain why it has to restart, instead of just transitioning to
standby? What do you mean by in limbo here?

bq. Also, NN1 could resign much earlier without having go through uncontrolled
abort via fencing
Before issuing an uncontrolled abort, the ZKFC2 will always try to do a
graceful fence -- ie ask it to self-resign via an RPC. See the
{{tryGracefulFence}} function in the {{FailoverController}} class.

Having the other node asking it to resign is better than having it ask itself
to resign -- the reason being that this is the only way the other node can be
sure that it's in the clear to start writing to the logs. (a
self-resignation might come too late). Since the other node always has to
verify the resignation before it starts to write, there's nothing extra gained
by having it resign itself first. It's just a redundancy.

Active NN should exit when it has not received a getServiceStatus() rpc from
ZKFC for timeout secs
--

Key: HDFS-3192
URL: https://issues.apache.org/jira/browse/HDFS-3192
Project: Hadoop HDFS
Issue Type: Sub-task
Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-04 Thread Hari Mankude (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246852#comment-13246852
]

Hari Mankude commented on HDFS-3192:

bq.Can you explain why it has to restart, instead of just transitioning to
standby? What do you mean by in limbo here?

in limbo implies that NN1 thinks that it is active even though NN2 has taken
over since it has not tried to access editlogs. So, it is not behaving as
standby and keeping up with active. Are you suggesting that ZKFC1 does
transitionToStandby() when it loses znode? On an active NN, there is a high
probability that it might abort. Also, does transitionToStandby() guarantee
that all the active-state threads have quisced?

bq.Before issuing an uncontrolled abort, the ZKFC2 will always try to do a
graceful fence – ie ask it to self-resign via an RPC. See the
tryGracefulFence function in the FailoverController class.

I don't think that doing tryGraceFulFence() from NN2 to NN1 is safe. First of
all, this is opening up one more channel of communication between NN1 and NN2
and this is subject to various races sequences, split-brain etc. I think
self-resign is much safer than trygracefulfence(). So far, I dont see a lack of
correctness argument in our discussion. Is my description correct here?

Active NN should exit when it has not received a getServiceStatus() rpc from
ZKFC for timeout secs
--

Key: HDFS-3192
URL: https://issues.apache.org/jira/browse/HDFS-3192
Project: Hadoop HDFS
Issue Type: Sub-task
Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-04 Thread Todd Lipcon (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246865#comment-13246865
]

Todd Lipcon commented on HDFS-3192:
---

bq. Are you suggesting that ZKFC1 does transitionToStandby() when it loses
znode?

It currently does, but I don't think it has to -- since ZKFC2 will call
NN1.transitionToStandby (see below).

bq. On an active NN, there is a high probability that it might abort

There are two possible scenarios:
1) *the local node finds out about the session expiration before the standby.*
In this case, it will call transitionToStandby, the local node will flush and
close its edit logs, and gracefully transition.
2) *the local node finds out after the other node.* In this case, the other
node will have already initiated the fencing process.
2a) If the local node is still accessible, then the other node will have
already called transitionToStandby(), in which case our own call will be a
no-op (since we're already in standby state). Everything is correct, because
the transitionToStandby() call flushes everything and gracefully closes its
edit log writer.
2b) If the local node is inaccessible (eg network down) then the other node
initiates non-graceful fencing. If it does STONITH, then our node will go down,
and the discussion is moot. If it does storage fencing, then our node no longer
has access to write to storage. This will prevent transitionToStandby() from
succeeding, since it will try to finalize its current edit log segment (which
involves mutating the fenced-off storage). So, it will correctly abort.

bq. I don't think that doing tryGraceFulFence() from NN2 to NN1 is safe. First
of all, this is opening up one more channel of communication between NN1 and
NN2 and this is subject to various races sequences, split-brain etc.

Doing RPC to your own NN is subject to way more race conditions because we have
no way of enforcing an ordering between NN1 going standby and NN2 becoming
active. NN2 *has* to verify that NN1 is either standby or effectively dead
before becoming active. The only way to do that is to first (a) ask it to be
standby, or (b) fence.

The lack of correct-ness in relying on self-resign is the example I gave above:

{quote}
1) NN1 writing to edits log
2) ZKFC1 loses lease, but doesn't know about it yet
3) ZKFC2 gets lease
4) NN2 becomes active, starts writing logs
5) NN1 writes some edits. World explodes.
6) ZKFC1 gets asynchronous notification from ZK that it lots its session.
Anything you do at this point is too late.
{quote}

The self-resign in step 6 is insufficient. We have to fence between step 3
and step 4. Whatever NN1 happens to do _after_ that point doesn't help anything
because it's too late.

Active NN should exit when it has not received a getServiceStatus() rpc from
ZKFC for timeout secs
--

Key: HDFS-3192
URL: https://issues.apache.org/jira/browse/HDFS-3192
Project: Hadoop HDFS
Issue Type: Sub-task
Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-04 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246868#comment-13246868
 ] 

Todd Lipcon commented on HDFS-3192:
---

Maybe the confusion is about readers? The one hole we have now is that, if 
there are no writes happening on the active, then it may happily continue to 
think it's active until the next write. I'd be in favor of solving this by 
adding code which writes a no-op edit to the edit log once every second or 
so. This bounds the amount of time in which it may think it's active and 
respond to read requests -- since the no-op edit will cause an abort if it 
loses its write access. Does that satisfy the issue you're raising?

 Active NN should exit when it has not received a getServiceStatus() rpc from 
 ZKFC for timeout secs
 --

 Key: HDFS-3192
 URL: https://issues.apache.org/jira/browse/HDFS-3192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-04 Thread Hari Mankude (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246949#comment-13246949
]

Hari Mankude commented on HDFS-3192:

bq.2a) If the local node is still accessible, then the other node will have
already called transitionToStandby(), in which case our own call will be a
no-op (since we're already in standby state). Everything is correct, because
the transitionToStandby() call flushes everything and gracefully closes its
edit log writer.

This is very dangerous and can result in all sorts of races.

1. ZKFC2 initiates transitionToStandby() to NN1.
2. Meanwhile, RPC does not start on NN1.
3. ZKFC2 loses the znode.
4. ZKFC1 now takes over
5. ZKFC1 does a becomeActive() on NN1.
6. transitionToStandby() starts executing converting NN1 to standby. There are
no active NNs in the cluster now.

ZKFC should communicate ONLY with its local NN. Otherwise, it will result in
all sorts of messy race conditions. Communication between NNs should be via
zookeeper znodes, editlogs and datanodes.

bq.1) NN1 writing to edits log
2) ZKFC1 loses lease, but doesn't know about it yet
3) ZKFC2 gets lease
4) NN2 becomes active, starts writing logs
5) NN1 writes some edits. World explodes.
6) ZKFC1 gets asynchronous notification from ZK that it lots its session.
Anything you do at this point is too late.

bq.Doing RPC to your own NN is subject to way more race conditions because we
have no way of enforcing an ordering between NN1 going standby and NN2 becoming
active. NN2 has to verify that NN1 is either standby or effectively dead before
becoming active. The only way to do that is to first (a) ask it to be standby,
or (b) fence.

I disagree. Doing RPC to your own NN is the safest mechanism that is available
in the HA environment. It is definitely safer than doing the RPC to a remote
NN. Do you agree?
I would like to make sure that I consider fencing required also and I am not
suggesting this method as an alternative to fencing. Instead, this method will
ensure that there are lesser situations where complicated algorithm of fencing
would have to be used and ensures that there is less probability of error.

bq. The self-resign in step 6 is insufficient. We have to fence between step
3 and step 4. Whatever NN1 happens to do after that point doesn't help anything
because it's too late.

I am not talking about self-resign in this situation. Self-resign as per this
jira will happen only if ZKFC1 is dead. In the above example, ZKFC1 is not
dead.
For the above example, ZKFC1 should abort NN1 when znode state change has
happened and restart NN1.

Active NN should exit when it has not received a getServiceStatus() rpc from
ZKFC for timeout secs
--

Key: HDFS-3192
URL: https://issues.apache.org/jira/browse/HDFS-3192
Project: Hadoop HDFS
Issue Type: Sub-task
Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-03 Thread Hari Mankude (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245889#comment-13245889
 ] 

Hari Mankude commented on HDFS-3192:


FC via healthmonitor will periodically poll the status of the NN via 
getServiceStatus(). If active NN has not heard from FC for timeout number of 
seconds, it should exit. This timeout has to be tied to the timeout of the 
ephemeral node in zookeeper and it should be a fraction of the ephemeral node 
timeout.

This ensures that if FC has died, then active NN also failfasts.

 Active NN should exit when it has not received a getServiceStatus() rpc from 
 ZKFC for timeout secs
 --

 Key: HDFS-3192
 URL: https://issues.apache.org/jira/browse/HDFS-3192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-03 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245895#comment-13245895
 ] 

Todd Lipcon commented on HDFS-3192:
---

Why?

I think it's an _advantage_ that the FC may die and come back, or that you may 
start the FCs after the NNs.

 Active NN should exit when it has not received a getServiceStatus() rpc from 
 ZKFC for timeout secs
 --

 Key: HDFS-3192
 URL: https://issues.apache.org/jira/browse/HDFS-3192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

18 matches

Site Navigation

Mail list logo

Footer information