[ 
https://issues.apache.org/jira/browse/DERBY-7034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855714#comment-16855714
 ] 

Rick Hillegas commented on DERBY-7034:
--------------------------------------

Thanks for the nudge, David. I have briefly browsed the link you provided. Is 
the following a fair summary of the problem:

1) The Linux fsync() call does not obey its contract. It can return before 
writes have durably recorded. The fix is to skip caching and use direct io 
instead.

2) FileDescriptor.sync() probably relies on fsync() (I haven't verified this 
claim). Therefore, system software like the Derby engine can not guarantee that 
commits have durably recorded. That, in turn, can corrupt data when, for 
instance, a USB stick is yanked out of its slot prematurely.

Can you help me understand why this is not a bigger problem than Derby? It 
seems to me that the problem is that FileDescriptor.sync() does not fulfill its 
contract. Has a bug been logged against Open JDK?

Thanks,
-Rick


> Derby's sync() handling can lead to database corruption (at least on Linux)
> ---------------------------------------------------------------------------
>
>                 Key: DERBY-7034
>                 URL: https://issues.apache.org/jira/browse/DERBY-7034
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.14.2.0
>            Reporter: David Sitsky
>            Priority: Major
>
> I recently read about "fsyncgate 2018" that the Postgres team raised: 
> https://wiki.postgresql.org/wiki/Fsync_Errors.  
> https://lwn.net/Articles/752063/ has a good overview of the issue relating to 
> fsync() behaviour on Linux.  The short summary is on some versions of Linux 
> if you retry fsync() after it failed, it will succeed and you will end up 
> with corrupted data on disk.
> At a quick glance at the Derby code, I have already seen two places where 
> sync() is retried in a loop which is clearly dangerous.  There could be other 
> areas too.
> In LogAccessFile:
> {code}
>     /**
>      * Guarantee all writes up to the last call to flushLogAccessFile on disk.
>      * <p>
>      * A call for clients of LogAccessFile to insure that all data written
>      * up to the last call to flushLogAccessFile() are written to disk.
>      * This call will not return until those writes have hit disk.
>      * <p>
>      * Note that this routine may block waiting for I/O to complete so 
>      * callers should limit the number of resource held locked while this
>      * operation is called.  It is expected that the caller
>      * Note that this routine only "writes" the data to the file, this does 
> not
>      * mean that the data has been synced to disk.  The only way to insure 
> that
>      * is to first call switchLogBuffer() and then follow by a call of sync().
>      *
>      **/
>     public void syncLogAccessFile() 
>         throws IOException, StandardException
>     {
>         for( int i=0; ; )
>         {
>             // 3311: JVM sync call sometimes fails under high load against 
> NFS 
>             // mounted disk.  We re-try to do this 20 times.
>             try
>             {
>                 synchronized( this)
>                 {
>                     log.sync();
>                 }
>                 // the sync succeed, so return
>                 break;
>             }
>             catch( SyncFailedException sfe )
>             {
>                 i++;
>                 try
>                 {
>                     // wait for .2 of a second, hopefully I/O is done by now
>                     // we wait a max of 4 seconds before we give up
>                     Thread.sleep( 200 ); 
>                 }
>                 catch( InterruptedException ie )
>                 {
>                     InterruptStatus.setInterrupted();
>                 }
>                 if( i > 20 )
>                     throw StandardException.newException(
>                         SQLState.LOG_FULL, sfe);
>             }
>         }
>     }
> {code}
> And LogToFile has similar retry code.. but without handling for 
> SyncFailedException:
> {code}
>     /**
>      * Utility routine to call sync() on the input file descriptor.
>      * <p> 
>     */
>     private void syncFile( StorageRandomAccessFile raf) 
>         throws StandardException
>     {
>         for( int i=0; ; )
>         {
>             // 3311: JVM sync call sometimes fails under high load against 
> NFS 
>             // mounted disk.  We re-try to do this 20 times.
>             try
>             {
>                 raf.sync();
>                 // the sync succeed, so return
>                 break;
>             }
>             catch (IOException ioe)
>             {
>                 i++;
>                 try
>                 {
>                     // wait for .2 of a second, hopefully I/O is done by now
>                     // we wait a max of 4 seconds before we give up
>                     Thread.sleep(200);
>                 }
>                 catch( InterruptedException ie )
>                 {   
>                     InterruptStatus.setInterrupted();
>                 }
>                 if( i > 20 )
>                 {
>                     throw StandardException.newException(
>                                 SQLState.LOG_FULL, ioe);
>                 }
>             }
>         }
>     }
> {code}
> It seems Postgres, MySQL and MongoDB have already changed their code to 
> "panic" if an error comes from an fsync() call.
> There is a lot more complexities with how fsync() reports errors (if at all). 
>  It is worth getting into it further as I am not familiar with Derby's 
> internals and how affected it could be by this.
> Interestingly people have indicated this issue is more likely to happen for 
> network filesystems (since write failures are more common due to the network 
> going down) and in the past it was easy just to say "NFS is broken".. but in 
> actual fact the problem was in some cases with fsync() and how it was called 
> in a loop.
> I've been trying to find out if Windows has similar issues without much luck. 
>  But given the mysterious corruption issues I have seen on the past with 
> Windows/CIFS.. I do wonder if this is related somehow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to