[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242095#comment-13242095
 ] 

Rakesh R commented on BOOKKEEPER-126:
-------------------------------------

So if I understand the conclusion correctly, we have discussed and identified 
two cases to be implemented as part of this jira:
# *When ledger flushing failed with IOException?*
    +Soln+ r-o mode:
    >> On IOE bookie (say, multi ledger dirs -> /tmp/bk1-data, /tmp/bk2-data 
etc) should see next ledger dirs for writing and mark the tried dirs as 
BAD_FOR_WRITE. Finally, if there is no success, then switch to r-o mode. 
    >> Also, if journal failed with IOE, immediately switch to r-o mode.
    Shall I open a subtask for the impl?
# *Ledger entries got corrupted due to disk failures or bad sectors?*
   +Soln+ scanner approach:
   IMHO, The following are the sequence of the healing procedure:
   * a) Perform scan and prepare entries owning:
    >> On startup bookie would contact ZK for the ledger metadata and on every 
write it would update the ledger metadata map.
    >> Special datastructure <ledgerDirId, <entryId, replica bookies>> needs to 
designed for the same contains ledgerId, entries owning, ledger dirs etc. ?

   * b) Read the entries and identify missing entries if any?
   Yeah, the DistributionScheduling is happening in the client side and batch 
reading is also good.
   I am thinking that the ledgers are local to the server and how about read 
them directly instead of using PerChannelBookieClient?.

   * c) Initiate re-replication:
   Corrupted bookie first identify the peer bookie which has the copy and send 
notification to this for re-replication. Here, it could use ZK watchers for 
sending the notification, for this each bookie should listen to a specfic 
persistent znode say 'underreplicaEntries'. The corrupted bookie should update 
the data <ledgerId, missingEntryIds> to 'underreplicaEntries' of the 
corresponding bookie which has the copy. On notification, the peer bookie 
should use the same logic of DistributionScheduling algo which presents in the 
client side. 
Is it legal, server depending on client?, otw server could randomly select a 
re-replica bookie and update the ZK ledger metadata?

How the ZK ledger metadata ('nextReplicaIndexToReadFrom') looks like after 
re-replication?
   For example:
   Say, entries 0-100 ledger metadata mapping is
   0 (A, B, C)
   50(B, C, D)
   End Ledger:100
   
   Assume, entries 30 to 39  got corrupted in B and say rereplicated to E. Is 
it like?
   0(A, B, C)
   30(E, B, C)
   40(B, C, D)
   50(B, C, D)

If you agree with the above approaches, probably do a detailed write-up.


   @Sijie
   another tough thing is we need to tell closed ledger from opened/in-recovery 
ledger, when handling last ensemble of opened/in-recovery ledger.

   I am missing something, Could you give more details on this?
                
> EntryLogger doesn't detect when one of it's logfiles is corrupt
> ---------------------------------------------------------------
>
>                 Key: BOOKKEEPER-126
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-126
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Ivan Kelly
>            Priority: Blocker
>             Fix For: 4.1.0
>
>
> If an entry log is corrupt, the bookie will ignore any entries past the 
> corruption. Quorum writes stops this being a problem at the moment, but we 
> should detect corruptions like this and rereplicate if necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to