[
https://issues.apache.org/jira/browse/BOOKKEEPER-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923636#comment-13923636
]
Rakesh R commented on BOOKKEEPER-733:
-------------------------------------
Hi All,
Following are few cases where the target bookie is not able to proceed happily
with the rereplication procedures.
I'm trying to put together all such cases where we come across. My intention is
simple to make everyone aware about the cases and I hope this will help us to
reach to a common conclusion.
+Case-1)+ Already have replica(am part of all the ledger fragments)
+Case-2)+ BKException - BKReadException, BKBookieHandleNotAvailableException
- quorum lost (Thanks a lot Ivan for bringing this scenario, where the
ledger losts the quorum and hanging around for re-replication.)
- slow bookies and not get enough responses et.
+Case-3)+ Other BKExceptions (if anything requires special attention).
Please see the initial draft proposal where I'm trying to address the cases
very specifically by introducing diff return codes. One reason for specific
handling is, these exceptions are known to the AutoRecovery module and can
esaily build intelligence out of this. Like I can utilize zk watch notification
mechanism or wait for configured retry intervals to trigger me et. I agree,
specific handling should not leave any loopholes.
I'd like to see the feedback and if agrees would explore more on this appraoch.
*Proposal:*
Introduce return code while releasing the lock like:
LedgerUnderreplicationManager#releaseUnderreplicatedLedger(ledgerId, rc)
ReturnCode:REPLICA_EXISTS
ReturnCode:READ_FAILURE
ReturnCode:FAILED
ReturnCode:OK
Based on the rc ZkLedgerUnderreplicationManager can build intelligence to
handle specific cases like:
Say, ZkLedgerUnderreplicationManager wil maintain a map say visitedLedgers -
<ReturnCode vs ListOfLedgers>
# *Case-1)* Already have replica(am part of all the ledger fragments)
Add/update collection to represents 'existingLedgers' in
ZkLedgerUnderreplicationManager and put the entry into 'visitedLedgers'
_RW Thread:_
step-1) On receving rc, he will add into this list.
step-2) Add a watcher to this ledger for further cleanups.
step-3) While getLedgerToRereplicate(), he will use 'existingLedgers'
and skip this ledger for now. So the unnecessary looping will be avoided for
this ledger.
_ZK Watcher Thread:_
Now on any NodeDeleted/NodeDataChanged, it will remove the ledger from the
list, considering that if ledger still exists as underreplicated. Now RW will
be able to recheck again to see any fragments can be rereplicated to me.
# *Case-2)* BKException - BKReadException, BKBookieHandleNotAvailableException
Add/update collection to represents 'errLedgers' in
ZkLedgerUnderreplicationManager and put the entry into 'visitedLedgers'
_RW Thread:_
step-1) On receiving rc, he will add into this list.
step-2) Add a watcher to this ledger for further cleanups.
step-3) Here the idea is to postpone the ledger replication after some
interval. Define the next interval, where he needs to consider this ledger to
check for rereplication. If this errLedger reaches the interval, it will just
remove it from the 'errLedgers', so that it will be available for rereplication
phase.
step-4) While getLedgerToRereplicate(), he will use 'errLedgers' and
skip this ledger for now. So the unnecessary looping will be avoided for this
ledger.
_ZK Watcher Thread:_
Now on any NodeDeleted/NodeDataChanged, it will remove the
ledger from the list, considering that the ledger can be rechecked again. This
will occur when the ledger is rereplicated by other guys. Or Auditor has
reported few more bookie failures for this ledger et.
# *Case-3)* Other BKExceptions (if anything requires special attention).
As of know, I didn't see any extra handling needed for this. It can follow the
same as Case-2
Thanks,
Rakesh
> Improve ReplicationWorker to handle the urLedgers which already have same
> leder replica in hand
> -----------------------------------------------------------------------------------------------
>
> Key: BOOKKEEPER-733
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-733
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-auto-recovery
> Affects Versions: 4.2.2, 4.3.0
> Reporter: Rakesh R
> Assignee: Rakesh R
>
> +Scenario:+
> Step1 : Have three bookies BK1, BK2, BK3
> Step2 : Have written ledgers with quorum 2
> Step3 : Unfortunately BK2 and BK3 both went down for few moments.
> The following logs are flooded in BK1 autorecovery logs. RW is trying to
> replicate the ledgers, but it simply skip this fragment and moves to next
> cycle when it sees a replica found in his hand. IMO, we should have a
> mechanism in place to avoid unnecessary cycles.
> {code}
> 2014-02-18 21:47:55,140 - ERROR - [New I/O client boss
> #2-1:PerChannelBookieClient$1@230] - Could not connect to bookie: [id:
> 0x00ba679e]/10.18.170.130:15002, current state CONNECTING :
> java.net.ConnectException: Connection refused: no further information
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
> at
> org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:401)
> at
> org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:370)
> at
> org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:292)
> at
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
> at java.lang.Thread.run(Thread.java:619)
> 2014-02-18 21:47:55,140 - INFO - 2014-02-18 21:59:33,215 - DEBUG -
> [ReplicationWorker:ReplicationWorker@182] - Target
> Bookie[10.18.170.130:15003] found in the fragment ensemble:
> [10.18.170.130:15003, 10.18.170.130:15001, 10.18.170.130:15002]
> [ReplicationWorker:PerChannelBookieClient@194] - Connecting to bookie:
> 10.18.170.130:15002
> 2014-02-18 21:47:56,162 - ERROR - [New I/O client boss
> #2-1:PerChannelBookieClient$1@230] - Could not connect to bookie: [id:
> 0x0003f377]/10.18.170.130:15002, current state CONNECTING :
> java.net.ConnectException: Connection refused: no further information
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
> at
> org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:401)
> at
> org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:370)
> at
> org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:292)
> at
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
> at java.lang.Thread.run(Thread.java:619)
> 2014-02-18 21:59:33,215 - DEBUG - [ReplicationWorker:ReplicationWorker@182]
> - Target Bookie[10.18.170.130:15003] found in the fragment ensemble:
> [10.18.170.130:15003, 10.18.170.130:15001, 10.18.170.130:15002]
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)