Re: [sqlite] Apple announces new File System with better ACID support

2016-06-14 Thread James K. Lowden
On Tue, 14 Jun 2016 10:49:05 +0900
??  wrote:

> > On 13 Jun 2016, at 10:13pm, Richard Hipp  wrote:
> >
> > The rename-is-atomic assumption is so wide-spread in the Linux
> > world, that the linux kernel was modified to make renames closer to
> > being atomic on common filesystems such as EXT4.
> 
> http://man7.org/linux/man-pages/man2/rename.2.html

rename(2) *is* atomic.  That doesn't mean it's synchronous with respect
to external storage.  It only means that no two processes will ever see
the file "in flight" in two places.  If process A calls rename(N,M), at
no point will process B have acceess to both N and M.  Once M is
available, N is extinquished.  

That's a useful property for a process that succeeds, and for which the
OS successfully flushes the data to disk.  

When Richard says rename isn't atomic, he means that it's not
synchronous with respect to the disk.  It makes no guarantee that the
directory entries were updated on disk.  The rename happens in the
kernel's filesystem memory structures, which *eventually* are persisted
to disk.  I have heard that that time lag may be measured in seconds.  

> I am interested to know what it would take to make linux renames
> fully atomic. Reading it as is it feels like the action of rename
> would be the most important piece to making rename atomic.  The docs
> claim this is atomic.  What other aspects would be necessary?

To make Linux rename fully synchronous is technically infeasible and
politically impossible.  

On the political side, the preference in Linux is invariably for
performance, often at ever-finer divisions of responsibility.  As an
example, Unix fsync(2) traditionally updated both the file and its
metadata; Linux divided those into fsync and fdatasync, and added the
requirement to call fsync on the directory. What was once a single call
became 2 or 3.  

As a technical matter, it's really infeasible because there are too
many moving parts: kernel, filesystem driver, and hardware.  It is
possible for a human being to know what kind of disk is installed and
how configured, and to know the semantics of a given filesystem.  It is
not possible for the kernel to patrol all those things, and hence the
kernel cannot make any guarantees about them.  (To take an extreme
example: NFS.)  

By the way, every DBMS I know anything about (and SQLite no
exception), tends to eschew OS services except at the most minimal
level.  The internals of a DBMS carry a lot of state information
unavailable to the kernel that the DBMS uses to prioritize how memory
is used and when and where I/O is required.  That's why every DBMS has
its own logging mechnism, and some bypass the filesystem altogether.

--jkl





___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Apple announces new File System with better ACID support

2016-06-13 Thread ネイト・フィンドリー
> On 13 Jun 2016, at 10:13pm, Richard Hipp  wrote:
>
> The rename-is-atomic assumption is so
> wide-spread in the Linux world, that the linux kernel was modified to
> make renames closer to being atomic on common filesystems such as
> EXT4.

http://man7.org/linux/man-pages/man2/rename.2.html

I am interested to know what it would take to make linux renames fully atomic.
Reading it as is it feels like the action of rename would be the most important
piece to making rename atomic.  The docs claim this is atomic.  What other
aspects would be necessary?

Maybe the issue is simply that although there "is no point at which
another process
attempting to access newpath will find it missing", the "another
process" doesn't know
when the file is fully written to disk?

Apologies if this is too off topic or obvious to everybody else.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Apple announces new File System with better ACID support

2016-06-13 Thread Simon Slavin

On 13 Jun 2016, at 10:13pm, Richard Hipp  wrote:

> On 6/13/16, Simon Slavin  wrote:
> 
>> The relevance to this list is mostly in the last item above: atomic
>> safe-save primitives.
> 
> The documentation indicates that safe-save only does file rename
> operations atomically.

Aaah you're right.  I was hoping for better support at the file writing or 
locking/unlocking level.  Disappointed.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Apple announces new File System with better ACID support

2016-06-13 Thread Richard Hipp
On 6/13/16, Simon Slavin  wrote:
>
> The relevance to this list is mostly in the last item above: atomic
> safe-save primitives.

The documentation indicates that safe-save only does file rename
operations atomically.  This of no help in making SQLite transactions
atomic.

SQLite cannot use file renaming because SQLite databases are used
concurrently by multiple processes, and so if one process moves the
file, it would move the file out from under other processes.

The safe-save feature appears to be an effort to aid grow-your-own
style atomicity that is commonly implemented by writing new content
into a new file, then renaming the new file over top of the old.  This
comes up a lot for applications that treat the filesystem as a
key/value database.  Many programmers have assumed that rename is
atomic on unix.  It is not.  The rename-is-atomic assumption is so
wide-spread in the Linux world, that the linux kernel was modified to
make renames closer to being atomic on common filesystems such as
EXT4.  I think this new feature of HFS+ is likely a similar effort.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users