[jira] [Assigned] (BOOKKEEPER-1044) Entrylogger is not readding rolled logs back to the logChannelsToFlush list when exception happens while trying to flush rolled logs

Sijie Guo (JIRA) Mon, 24 Jul 2017 17:49:22 -0700

     [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sijie Guo reassigned BOOKKEEPER-1044:
-------------------------------------

    Assignee: Sijie Guo  (was: Charan Reddy Guttapalem)

I attached a simple fix at https://github.com/apache/bookkeeper/pull/286

> Entrylogger is not readding rolled logs back to the logChannelsToFlush list 
> when exception happens while trying to flush rolled logs
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-1044
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-1044
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Charan Reddy Guttapalem
>            Assignee: Sijie Guo
>            Priority: Blocker
>             Fix For: 4.5.0
>
>
> SyncThread.checkpoint(Checkpoint checkpoint) (which is called periodically by 
> SyncThread's executor for every flushInterval) ultimately calls 
> EntryLogger.flushRotatedLogs.  
> In EntryLogger.flushRotatedLogs, first we set 'logChannelsToFlush' to null 
> and then we try to flush and close individual file. Now, if IOException 
> happens while trying to flush/close the logchannel, then exception is thrown 
> as it is and it get propagates back upto SyncThread.checkpoint. Here we catch 
> that IOException, log it and return without calling the checkpointComplete. 
> But by now we lost reference of 'logChannelsToFlush' (rolled logs which are 
> yet to be closed), because it is set to null before we try to flush/close 
> individually rolledlogs. The next execution of 'checkpoint' (after 
> flushinterval) wouldn't be knowing about the rolledlogs it failed to 
> flush/close the previous time and it would flush the newly rolledlogs. So the 
> failure of flush/close of the previous rolledlogs goes unnoticed completely. 
> in EntryLogger.java
>         void flushRotatedLogs() throws IOException {
>         List<BufferedLogChannel> channels = null;
>         long flushedLogId = INVALID_LID;
>         synchronized (this) {
>             channels = logChannelsToFlush;
>             logChannelsToFlush = null;               <--------- here we set 
> 'logChannelsToFlush' to null before it tries to flush/close rolledlogs 
>         }
>         if (null == channels) {
>             return;
>         }
>         for (BufferedLogChannel channel : channels) {
>             channel.flush(true);                      
> <------------IOEXception can happen here or in the following closeFileChannel 
> call             
>             // since this channel is only used for writing, after flushing 
> the channel,
>             // we had to close the underlying file channel. Otherwise, we 
> might end up
>             // leaking fds which cause the disk spaces could not be reclaimed.
>             closeFileChannel(channel);
>             if (channel.getLogId() > flushedLogId) {
>                 flushedLogId = channel.getLogId();
>             }
>             LOG.info("Synced entry logger {} to disk.", channel.getLogId());
>         }
>         // move the leastUnflushedLogId ptr
>         leastUnflushedLogId = flushedLogId + 1;
>     }
> in SyncThread.java
>     public void checkpoint(Checkpoint checkpoint) {
>         try {
>             checkpoint = ledgerStorage.checkpoint(checkpoint);
>         } catch (NoWritableLedgerDirException e) {
>             LOG.error("No writeable ledger directories", e);
>             dirsListener.allDisksFull();
>             return;
>         } catch (IOException e) {
>             LOG.error("Exception flushing ledgers", e); <-----that IOExc gets 
> propagated to this method and here it is caught and not dealt appropriately   
>  
>             return;
>         }
>         try {
>             checkpointSource.checkpointComplete(checkpoint, true);
>         } catch (IOException e) {
>             LOG.error("Exception marking checkpoint as complete", e);
>             dirsListener.allDisksFull();
>         }
>     }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (BOOKKEEPER-1044) Entrylogger is not readding rolled logs back to the logChannelsToFlush list when exception happens while trying to flush rolled logs

Reply via email to