[
https://issues.apache.org/jira/browse/ZOOKEEPER-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944354#comment-13944354
]
Rakesh R commented on ZOOKEEPER-1502:
-------------------------------------
Thanks [~michim] for the comments and will do 1, 3.
For the 2nd point,
bq.consider Ted's suggestion of using PID to detect stale lock files.
Few cases that comes in my mind :-
# Normal JVM termination - FileLock#release() would release the lock and
File#deleteOnExit() would delete the lock file. Now another server can be able
to do acquire the lock.
# Abnormal JVM exit (kill) - Here the previous 'in_use.lock' will leave as it
is and become orphan. Now when another server comes and tries to acquire, it
will be able to do so.
# JVM running without releasing the lock(probably embedded zkserver case) - I
feel if this case exists, then it could be a functional issue of our lock
implementation.
# lock is acquired by a non-zk process - In our case zk server is interested to
acquire 'in_use.lock' file and whether we need to consider the external access
of in_use.lock acquisition ?
Any more cases where it results in stale lock ?
AFAIK FileLock itself will help us to acquire the lock only if no other JVM
held the lock and I feel no other special handling requires using PID checks.
Does this sound good to you ?
> Prevent multiple zookeeper servers from using the same data directory
> ---------------------------------------------------------------------
>
> Key: ZOOKEEPER-1502
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1502
> Project: ZooKeeper
> Issue Type: Improvement
> Components: server
> Affects Versions: 3.4.3
> Reporter: Will Johnson
> Assignee: Rakesh R
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1502.patch
>
>
> We recently ran into an issue where two zookeepers servers which were a part
> of two separate quorums were configured to use the same data directory.
> Interestingly, the zookeeper servers did not seem to complain and both seemed
> to work fine until one of them was restarted. Once that happened all sort of
> chaos ensued. I understand that this is a misconfiguration should zookeeper
> complain about this or do users need to protect themselves in some external
> fashion? Is a simple file lock enough or are there other things I should
> take into consideration if it’s up to me to handle?
--
This message was sent by Atlassian JIRA
(v6.2#6252)