Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-19 Thread Brendan Duddridge
I'm using SQLCipher in my main app and it's using SQLite 3.15.2. However,
my little sample app where I could also easily demonstrate the problem, is
using whatever the built-in SQLite version is in macOS Sierra 10.12.4. I
couldn't find SQLITE_MMAP_READWRITE in the Couchbase Lite source anywhere,
so I'm sure it's not setting it. All I know now is with mmap I/O on, I get
the corruption when the power goes out. With it off, I don't.

On Wed, Apr 19, 2017 at 12:21 AM, Dan Kennedy  wrote:

> On 04/19/2017 02:42 AM, Jens Alfke wrote:
>
>> On Apr 18, 2017, at 2:20 AM, Deon Brewis  wrote:
>>>
>>> It's not like it was subtle - it's a dead on repro. I was able to repro
>>> this by doing a power cycle 2 hours after shutting the app down. OSX didn't
>>> seem to have any interest in flushing mmap files until you soft reboot the
>>> machine.
>>>
>> OK, hang on — I just reread the docs on memory-mapped I/O in SQLite.
>> Memory-mapping is *only used for reads*, never for writes:
>>
>
> It was used for writes in versions before 3.10.0 (January 2016). And still
> is if SQLITE_MMAP_READWRITE is defined (not the default).
>
> Dan.
>
>
>
>
>
>
>
>> When updating the database file, SQLite always makes a copy of the page
 content into heap memory before modifying the page. This is necessary for
 two reasons. First, changes to the database are not supposed to be visible
 to other processes until after the transaction commits and so the changes
 must occur in private memory. Second, SQLite uses a read-only memory map to
 prevent stray pointers in the application from overwriting and corrupting
 the database file.

>>> — https://www.sqlite.org/mmap.html
>>
>> Therefore I can’t imagine how using it could trigger database corruption.
>> It doesn’t affect the way data is written at all!
>>
>> I accept that both of you have experimentally seen that memory-mapping
>> leads to corruption, so I can only assume that either the above
>> documentation is wrong, or that there’s some subtle bug in SQLite that
>> alters the way data is written when memory-mapping is enabled.
>>
>> —Jens
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-19 Thread Dan Kennedy

On 04/19/2017 02:42 AM, Jens Alfke wrote:

On Apr 18, 2017, at 2:20 AM, Deon Brewis  wrote:

It's not like it was subtle - it's a dead on repro. I was able to repro this by 
doing a power cycle 2 hours after shutting the app down. OSX didn't seem to 
have any interest in flushing mmap files until you soft reboot the machine.

OK, hang on — I just reread the docs on memory-mapped I/O in SQLite. 
Memory-mapping is *only used for reads*, never for writes:


It was used for writes in versions before 3.10.0 (January 2016). And 
still is if SQLITE_MMAP_READWRITE is defined (not the default).


Dan.








When updating the database file, SQLite always makes a copy of the page content 
into heap memory before modifying the page. This is necessary for two reasons. 
First, changes to the database are not supposed to be visible to other 
processes until after the transaction commits and so the changes must occur in 
private memory. Second, SQLite uses a read-only memory map to prevent stray 
pointers in the application from overwriting and corrupting the database file.

— https://www.sqlite.org/mmap.html

Therefore I can’t imagine how using it could trigger database corruption. It 
doesn’t affect the way data is written at all!

I accept that both of you have experimentally seen that memory-mapping leads to 
corruption, so I can only assume that either the above documentation is wrong, 
or that there’s some subtle bug in SQLite that alters the way data is written 
when memory-mapping is enabled.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Jens Alfke

> On Apr 18, 2017, at 9:46 PM, Deon Brewis  wrote:
> 
> I did report it:
> http://sqlite.1065341.n5.nabble.com/SQLITE-vs-OSX-mmap-inevitable-catalog-corruption-td85620.html

Thanks for the pointer to the thread. There was a reply by Dan Kennedy shortly 
thereafter:

>> As of yesterday, SQLite uses a read-only mapping in mmap mode. The db 
>> file is written using plain old write(), just as in non-mmap mode: 
>> 
>>http://sqlite.org/src/info/67c5d3c646c8198c
>> 
>> It would be interesting to know if this clears the problem in your 
>> environment. 

Have you re-run your test with versions of SQLite from after this change?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Deon Brewis
> That is a hugely important piece of information, and while I thank you for 
> letting me know, I’m shocked that this hasn’t been conveyed to the SQLite 
> developers (or, apparently, to Apple.) Did you report it anywhere?

I did report it:
http://sqlite.1065341.n5.nabble.com/SQLITE-vs-OSX-mmap-inevitable-catalog-corruption-td85620.html

- Deon 

-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of Jens Alfke
Sent: Tuesday, April 18, 2017 10:47 AM
To: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
Subject: Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS


> On Apr 18, 2017, at 1:50 AM, Deon Brewis <de...@outlook.com> wrote:
> 
> Are you by change using memory mapped IO (MMAP_SIZE something other than 0)?
> 
> This does not work on OSX. Not even remotely. I tracked an issue down in 
> November 2015, and was able to trivially corrupt a database 100% of the time 
> using the repro steps below. This happens long after our app gets shut down 
> and SQLITE is flushed.  It got fixed when I set MMAP_SIZE to 0.

That is a hugely important piece of information, and while I thank you for 
letting me know, I’m shocked that this hasn’t been conveyed to the SQLite 
developers (or, apparently, to Apple.) Did you report it anywhere?

I’m the architect of Couchbase Lite and the lead developer for iOS and macOS. I 
enabled SQLite’s memory-mapped I/O at least two years ago. I never considered 
it could be problematic since (a) Brendan is IIRC the only iOS/Mac developer 
who’s reported database corruption, and (b) I assumed scenarios like this would 
be covered as part of SQLite testing, either by SQLite themselves or by Apple.

> “From the OSX documentation:
> Note that while fsync() will flush all data from the host to the drive (i.e. 
> the "permanent storage device"), the drive itself may not physically write 
> the data to the platters for quite some time and it may be written in an 
> out-of-order sequence. Specifically, if the drive loses power or the OS 
> crashes, the application may find that only some or none of their data was 
> written.  The disk drive may also re-order the data so that later writes may 
> be present, while earlier writes are not.

This is a statement about hard disk controller firmware and is true for any OS. 
(I used to work for Apple, and corresponded with filesystem architect Dominic 
Giampaolo about this back in the day.) Some disk controllers don’t flush all 
cached data to the platters when told to flush, apparently to boost benchmark 
scores. Darwin has a FULLFSYNC command (an option to ioctl) that triggers an 
expensive full reset of the disk controller, which does produce a full flush.

The filesystem periodically performs FULLFSYNCs to ensure durability of changes 
to filesystem metadata, so that the filesystem itself can’t become corrupted by 
power loss. So I’m surprised that, in your test, cutting power after 
macroscopic amounts of time (2 minutes) have passed since closing the SQLite 
file still resulted in data being lost.

This doesn’t seem like it would be specific to memory-mapped I/O, though. 
Darwin has a universal buffer cache (like many other kernels) so memory-mapped 
writes and ordinary file writes are treated the same way by the filesystem 
cache.

Of course it’s possible there’s some kind of OS bug involved here; if so, it 
seems pretty serious. I’ll be glad to file a bug report with Apple if it turns 
out to be so.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Jens Alfke

> On Apr 18, 2017, at 2:35 PM, Bob Friesenhahn  
> wrote:
> 
> If the filesystem implementation is not fully-coherent, then data written by 
> programmed file I/O may not be reflected in the memory mapped space, 
> resulting in programs using something other than what is in the file, 
> resulting in wrong data being written to the file.

Darwin is fully coherent this way (that’s basically what a Universal Buffer 
Cache guarantees.)

Even if it weren’t, the incoherency would cause problems while making changes, 
i.e. _before_ the power-loss, not afterwards. Which is not what’s seen.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Bob Friesenhahn

On Tue, 18 Apr 2017, Jens Alfke wrote:




On Apr 18, 2017, at 2:20 AM, Deon Brewis  wrote:

It's not like it was subtle - it's a dead on repro. I was able to repro this by 
doing a power cycle 2 hours after shutting the app down. OSX didn't seem to 
have any interest in flushing mmap files until you soft reboot the machine.


OK, hang on — I just reread the docs on memory-mapped I/O in SQLite. 
Memory-mapping is *only used for reads*, never for writes:


When updating the database file, SQLite always makes a copy of the page content 
into heap memory before modifying the page. This is necessary for two reasons. 
First, changes to the database are not supposed to be visible to other 
processes until after the transaction commits and so the changes must occur in 
private memory. Second, SQLite uses a read-only memory map to prevent stray 
pointers in the application from overwriting and corrupting the database file.

— https://www.sqlite.org/mmap.html

Therefore I can’t imagine how using it could trigger database corruption. It 
doesn’t affect the way data is written at all!


If the filesystem implementation is not fully-coherent, then data 
written by programmed file I/O may not be reflected in the memory 
mapped space, resulting in programs using something other than what is 
in the file, resulting in wrong data being written to the file.


GraphicsMagick has a configure test for this and some operating 
systems fail the test.  Sqlite3 is free to adopt this same test.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Brendan Duddridge
Hi Jens,

It would be a good test if you could independently verify my findings using
the sample app I wrote to see if you are able to reproduce the corruption
with memory mapped I/O turned ON and the fact that there's no corruption
when it's turned OFF. I know it seem strange given the documentation you
found.

Can I ask what the benefit of having memory mapped I/O is for SQLite? And
what are the drawbacks of turning it off? I understand what memory mapped
I/O is, reducing the amount of memory needed to load in the data from a
file. I use it in other parts of my app such as adding file attachments to
an email, but I use the higher level NSDataReadingzMappedAlways option on
NSData.

NSData *pdfData = [[NSData alloc] initWithContentsOfURL:pdfURL options:
NSDataReadingMappedAlways error:nil];


I'm just wondering if I'm going to encounter any other issues by disabling
it. Perhaps it needs to be conditionalized for Mac vs. iOS?

On Tue, Apr 18, 2017 at 1:42 PM, Jens Alfke  wrote:

>
> > On Apr 18, 2017, at 2:20 AM, Deon Brewis  wrote:
> >
> > It's not like it was subtle - it's a dead on repro. I was able to repro
> this by doing a power cycle 2 hours after shutting the app down. OSX didn't
> seem to have any interest in flushing mmap files until you soft reboot the
> machine.
>
> OK, hang on — I just reread the docs on memory-mapped I/O in SQLite.
> Memory-mapping is *only used for reads*, never for writes:
>
> >> When updating the database file, SQLite always makes a copy of the page
> content into heap memory before modifying the page. This is necessary for
> two reasons. First, changes to the database are not supposed to be visible
> to other processes until after the transaction commits and so the changes
> must occur in private memory. Second, SQLite uses a read-only memory map to
> prevent stray pointers in the application from overwriting and corrupting
> the database file.
> — https://www.sqlite.org/mmap.html
>
> Therefore I can’t imagine how using it could trigger database corruption.
> It doesn’t affect the way data is written at all!
>
> I accept that both of you have experimentally seen that memory-mapping
> leads to corruption, so I can only assume that either the above
> documentation is wrong, or that there’s some subtle bug in SQLite that
> alters the way data is written when memory-mapping is enabled.
>
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Jens Alfke

> On Apr 18, 2017, at 12:55 PM, Jeffrey Mattox  wrote:
> 
> I'm an iOS and macOS developer.  Mac app bundles are special in other ways 
> beside just having a bit set.

Brendan is not storing his database in the app bundle. Both he and Simon have 
already said that.

—Jens

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Jeffrey Mattox
I'm an iOS and macOS developer.  Mac app bundles are special in other ways 
beside just having a bit set.  For one, there's a security check somewhere that 
verifies that the app bundle has not been changed, as those files are expected 
to be read-only.  Apple says apps should put their data elsewhere (e.g., 
~/Documents/).  I'd never store a file that I write to in the app bundle.

Jeff


> On Apr 18, 2017, at 1:45 PM, Simon Slavin  wrote:
> 
> I don’t think this can be the problem.  As you wrote, a package is just a 
> folder with a bit set.  Not only do they not affect anything but it’s 
> difficult for your app to even tell whether the file is in a folder or a 
> package.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Jens Alfke

> On Apr 18, 2017, at 2:20 AM, Deon Brewis  wrote:
> 
> It's not like it was subtle - it's a dead on repro. I was able to repro this 
> by doing a power cycle 2 hours after shutting the app down. OSX didn't seem 
> to have any interest in flushing mmap files until you soft reboot the machine.

OK, hang on — I just reread the docs on memory-mapped I/O in SQLite. 
Memory-mapping is *only used for reads*, never for writes:

>> When updating the database file, SQLite always makes a copy of the page 
>> content into heap memory before modifying the page. This is necessary for 
>> two reasons. First, changes to the database are not supposed to be visible 
>> to other processes until after the transaction commits and so the changes 
>> must occur in private memory. Second, SQLite uses a read-only memory map to 
>> prevent stray pointers in the application from overwriting and corrupting 
>> the database file.
— https://www.sqlite.org/mmap.html

Therefore I can’t imagine how using it could trigger database corruption. It 
doesn’t affect the way data is written at all!

I accept that both of you have experimentally seen that memory-mapping leads to 
corruption, so I can only assume that either the above documentation is wrong, 
or that there’s some subtle bug in SQLite that alters the way data is written 
when memory-mapping is enabled.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Simon Slavin

On 18 Apr 2017, at 7:37pm, Brendan Duddridge  wrote:

> Well, perhaps the way I've been bundling the db.sqlite file within a
> package triggers this particular bug. Although a package is really just a
> sub-folder, so I don't see how that would make a difference.

I don’t think this can be the problem.  As you wrote, a package is just a 
folder with a bit set.  Not only do they not affect anything but it’s difficult 
for your app to even tell whether the file is in a folder or a package.

If you’re keeping the database inside the application’s bundle, that’s 
different.  But you’re not doing that.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Brendan Duddridge
> I’m the architect of Couchbase Lite and the lead developer for iOS and
> macOS. I enabled SQLite’s memory-mapped I/O at least two years ago. I never
> considered it could be problematic since (a) Brendan is IIRC the only
> iOS/Mac developer who’s reported database corruption, and (b) I assumed
> scenarios like this would be covered as part of SQLite testing, either by
> SQLite themselves or by Apple.
>

Well, perhaps the way I've been bundling the db.sqlite file within a
package triggers this particular bug. Although a package is really just a
sub-folder, so I don't see how that would make a difference. I'm not sure.
But I've been testing more and more now and by disabling the memory mapped
I/O, the database file corruption bug appears to be gone. It's all working
great now. And I could easily cause the corruption prior to that.

I just reverted back to the build that included the memory mapped I/O call
and I was again able to corrupt the database file with the power failure.
Then I again used the version that had memory mapped I/O disabled and I was
no longer able to corrupt the database. So I've tested it both ways
repeatedly and I think I'm going to conclude that the memory mapped I/O
call is the culprit.

Thanks again Deon for sharing your experience with memory mapped I/O in
SQLite on macOS. And thanks Jens for starting this thread to try and get to
the bottom of this problem.

Brendan
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Simon Slavin
On 18 Apr 2017, at 6:46pm, Jens Alfke  wrote:

> This is a statement about hard disk controller firmware and is true for any 
> OS. (I used to work for Apple, and corresponded with filesystem architect 
> Dominic Giampaolo about this back in the day.) Some disk controllers don’t 
> flush all cached data to the platters when told to flush, apparently to boost 
> benchmark scores. Darwin has a FULLFSYNC command (an option to ioctl) that 
> triggers an expensive full reset of the disk controller, which does produce a 
> full flush.

That’s what "PRAGMA fullfsync" is about.  From what I can see, SQLite uses it 
in a way that Apple’s documentation says should work.  But it seems that 
despite being all Apple says is necessary, it’s not doing the job.  Note, 
however, that SQLite documentation says that the default value for this flag is 
off, even when running on a Mac.

> [snip]
> 
> This doesn’t seem like it would be specific to memory-mapped I/O, though. 
> Darwin has a universal buffer cache (like many other kernels) so 
> memory-mapped writes and ordinary file writes are treated the same way by the 
> filesystem cache.

Need to check Darwin source code.  The documentation for some operating systems 
considers memory-mapping to be special, and separate from normal file access.  
So things which configure how normal file access is done sometimes don’t apply 
to memory-mapped file access or memory-mapped I/O in general.  Power-loss is a 
good test of memory-mapped changes and problems with them.

> Of course it’s possible there’s some kind of OS bug involved here; if so, it 
> seems pretty serious. I’ll be glad to file a bug report with Apple if it 
> turns out to be so.

People who’ve posted things connected to this don’t tend to mention that 
they’ve set "PRAGMA fullfsync = ON".  It’s the sort of thing you’d only know to 
do after significant debugging.  Maybe they’re not doing it and that’s the 
cause of their problems.

Simon.

Obligatory Disclosure: I’m a Macintosh specialist, a Macintosh programmer, and 
I know a lot about computer security.  But this stuff is at the edge of my area 
of competence.  I’ve never worked for Apple and don’t know the tiny details of 
this stuff.  Don't depend on the things I write here for critical decisions.  
If you’re a member of Apple’s Developer Programme perhaps you should get Apple 
involved.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Jens Alfke

> On Apr 18, 2017, at 1:50 AM, Deon Brewis  wrote:
> 
> Are you by change using memory mapped IO (MMAP_SIZE something other than 0)?
> 
> This does not work on OSX. Not even remotely. I tracked an issue down in 
> November 2015, and was able to trivially corrupt a database 100% of the time 
> using the repro steps below. This happens long after our app gets shut down 
> and SQLITE is flushed.  It got fixed when I set MMAP_SIZE to 0.

That is a hugely important piece of information, and while I thank you for 
letting me know, I’m shocked that this hasn’t been conveyed to the SQLite 
developers (or, apparently, to Apple.) Did you report it anywhere?

I’m the architect of Couchbase Lite and the lead developer for iOS and macOS. I 
enabled SQLite’s memory-mapped I/O at least two years ago. I never considered 
it could be problematic since (a) Brendan is IIRC the only iOS/Mac developer 
who’s reported database corruption, and (b) I assumed scenarios like this would 
be covered as part of SQLite testing, either by SQLite themselves or by Apple.

> “From the OSX documentation:
> Note that while fsync() will flush all data from the host to the drive (i.e. 
> the "permanent storage device"), the drive itself may not physically write 
> the data to the platters for quite some time and it may be written in an 
> out-of-order sequence. Specifically, if the drive loses power or the OS 
> crashes, the application may find that only some or none of their data was 
> written.  The disk drive may also re-order the data so that later writes may 
> be present, while earlier writes are not.

This is a statement about hard disk controller firmware and is true for any OS. 
(I used to work for Apple, and corresponded with filesystem architect Dominic 
Giampaolo about this back in the day.) Some disk controllers don’t flush all 
cached data to the platters when told to flush, apparently to boost benchmark 
scores. Darwin has a FULLFSYNC command (an option to ioctl) that triggers an 
expensive full reset of the disk controller, which does produce a full flush.

The filesystem periodically performs FULLFSYNCs to ensure durability of changes 
to filesystem metadata, so that the filesystem itself can’t become corrupted by 
power loss. So I’m surprised that, in your test, cutting power after 
macroscopic amounts of time (2 minutes) have passed since closing the SQLite 
file still resulted in data being lost.

This doesn’t seem like it would be specific to memory-mapped I/O, though. 
Darwin has a universal buffer cache (like many other kernels) so memory-mapped 
writes and ordinary file writes are treated the same way by the filesystem 
cache.

Of course it’s possible there’s some kind of OS bug involved here; if so, it 
seems pretty serious. I’ll be glad to file a bug report with Apple if it turns 
out to be so.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Simon Slavin

On 18 Apr 2017, at 11:52am, Richard Hipp  wrote:

> I was unaware that memory-mapped I/O was busted on MacOS.  I will
> investigate further, but probably the solution will be that we will
> completely disable memory-mapped I/O on MacOS, just as we have had to
> do for OpenBSD.

I’m guessing that disabling memory-mapping will also prevent the 
SQLITE_IOERR_VNODE error that was mentioned here last week.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Richard Hipp
On 4/18/17, Brendan Duddridge  wrote:
>
> I commented out this line of code in the Couchbase lite CBL_SQliteStorage.m
> source file and no more corruption
>
> //int err = sqlite3_config(SQLITE_CONFIG_MMAP_SIZE,
> (SInt64)kSQLiteMMapSize, (SInt64)-1);
>
>
> It would be really awesome of this were added to the How to Corrupt an
> SQLite Database web page.

I was unaware that memory-mapped I/O was busted on MacOS.  I will
investigate further, but probably the solution will be that we will
completely disable memory-mapped I/O on MacOS, just as we have had to
do for OpenBSD.  Thanks for letting us know that this is a problem.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Brendan Duddridge
Hi Deon,

BINGO

I commented out this line of code in the Couchbase lite CBL_SQliteStorage.m
source file and no more corruption

//int err = sqlite3_config(SQLITE_CONFIG_MMAP_SIZE,
(SInt64)kSQLiteMMapSize, (SInt64)-1);


It would be really awesome of this were added to the How to Corrupt an
SQLite Database web page. Although it would probably be prudent if someone
could verify these findings using my sample app and a patched version of
Couchbase Lite. I'll post the solution to the Couchbase Lite Github issue
that I started.

I used to be able to reproduce the corruption with a single power cut. But
now after commenting out that code, I'm unable to corrupt the database
after about 6 or 7 power cuts. So I think it's safe to say that this
problem is solved.

Big thanks to you Deon. You just saved my bacon! Thank you so much.


Brendan


On Tue, Apr 18, 2017 at 3:20 AM, Deon Brewis <de...@outlook.com> wrote:

> Yip. Tried that. At some point I had like 6 or 7 debug switches in the app
> to try all manner of FULLSYNC/WAL/SYNCHRONOUS combinations. At the end it
> was the MMAP_SIZE that did it.
>
> It's not like it was subtle - it's a dead on repro. I was able to repro
> this by doing a power cycle 2 hours after shutting the app down. OSX didn't
> seem to have any interest in flushing mmap files until you soft reboot the
> machine.
>
> The last time I tried this though was on Yosemite and Mavericks and
> whatever version of SQLITE was out at the time, so things may be different
> now. But it would be the first place I would look for corruption on OSX
> related to power cycling.
>
> - Deon
>
> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Simon Slavin
> Sent: Tuesday, April 18, 2017 1:57 AM
> To: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
> Subject: Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS
>
>
> On 18 Apr 2017, at 9:50am, Deon Brewis <de...@outlook.com> wrote:
>
> > “From the OSX documentation:
> >
> > Note that while fsync() will flush all data from the host to the drive
> (i.e. the "permanent storage device"),
>
> Deon,
>
> I’m not sure this is related, but have you seen
>
> <http://www.sqlite.org/pragma.html#pragma_fullfsync>
>
> <http://www.sqlite.org/pragma.html#pragma_checkpoint_fullfsync>
>
> ?
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Deon Brewis
Yip. Tried that. At some point I had like 6 or 7 debug switches in the app to 
try all manner of FULLSYNC/WAL/SYNCHRONOUS combinations. At the end it was the 
MMAP_SIZE that did it.

It's not like it was subtle - it's a dead on repro. I was able to repro this by 
doing a power cycle 2 hours after shutting the app down. OSX didn't seem to 
have any interest in flushing mmap files until you soft reboot the machine.

The last time I tried this though was on Yosemite and Mavericks and whatever 
version of SQLITE was out at the time, so things may be different now. But it 
would be the first place I would look for corruption on OSX related to power 
cycling.

- Deon

-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of Simon Slavin
Sent: Tuesday, April 18, 2017 1:57 AM
To: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
Subject: Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS


On 18 Apr 2017, at 9:50am, Deon Brewis <de...@outlook.com> wrote:

> “From the OSX documentation:
> 
> Note that while fsync() will flush all data from the host to the drive (i.e. 
> the "permanent storage device"),

Deon,

I’m not sure this is related, but have you seen

<http://www.sqlite.org/pragma.html#pragma_fullfsync>

<http://www.sqlite.org/pragma.html#pragma_checkpoint_fullfsync>

?

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Simon Slavin

On 18 Apr 2017, at 10:07am, Brendan Duddridge  wrote:
> 
>// Enable memory-mapped I/O if available

Please try an alternative version of that code which definitely does not use 
memory mapping in any way.  You may be able to do it by changing that IFDEF 
sequence.  Or you should instead have SQLite execute this command

PRAGMA mmap_size=0;

before it opens any files.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Brendan Duddridge
Hi Deon,

That's interesting about the memory mapped IO. I just looked in the source
for Couchbase Lite and they do something like that on setup:

+ (void) firstTimeSetup {

// Test the version of the actual SQLite implementation at runtime.
Necessary because

// the app might be linked with a custom version of SQLite (like
SQLCipher) instead of the

// system library, so the actual version/features may differ from what
was declared in

// sqlite3.h at compile time.

Log(@"Couchbase Lite using SQLite version %s (%s)",

sqlite3_libversion(), sqlite3_sourceid());

#if 0

for (int i=0; true; i++) {

const char* opt = sqlite3_compileoption_get(i);

if (!opt)

break;

Log(@"SQLite option '%s'", opt);

}

#endif

sSQLiteVersion = sqlite3_libversion_number();

Assert(sSQLiteVersion >= 3007000,

   @"SQLite library is too old (%s); needs to be at least 3.7",
sqlite3_libversion());


// Enable memory-mapped I/O if available

#ifndef SQLITE_CONFIG_MMAP_SIZE

#define SQLITE_CONFIG_MMAP_SIZE22  /* sqlite3_int64, sqlite3_int64 */

#endif

int err = sqlite3_config(SQLITE_CONFIG_MMAP_SIZE, (SInt64)
kSQLiteMMapSize, (SInt64)-1);

if (err != SQLITE_OK)

Log(@"FYI, couldn't enable SQLite mmap: error %d", err);


sqlite3_config(SQLITE_CONFIG_LOG, errorLogCallback, NULL);

}

From your description, it sounds like that be causing this problem. Hmm...

Brendan


On Tue, Apr 18, 2017 at 2:50 AM, Deon Brewis <de...@outlook.com> wrote:

> Are you by change using memory mapped IO (MMAP_SIZE something other than
> 0)?
>
> This does not work on OSX. Not even remotely. I tracked an issue down in
> November 2015, and was able to trivially corrupt a database 100% of the
> time using the repro steps below. This happens long after our app gets shut
> down and SQLITE is flushed.  It got fixed when I set MMAP_SIZE to 0.
>
> Repro (100%):
>
> 1) Run our app
> 2) Shutdown our app
> 3) Wait for our app to cleanly shut down – nothing showing in Activity
> Monitor – and app.db-wal deleted from disk (i.e. SQLITE clean close)
> 4) Wait 2 minutes (so our app isn’t running in this 2 minute period at all)
>
> 5) Copy the our db file to a NAS
> 6) Hard reboot the machine (power cycle).
> 7) Copy the db file to the NAS again (no reopening the app, just copy the
> file back to the NAS after the reboot).
>
> Observe…
>
> The file from #5 still works fine.
> The file from #7 is corrupted.
>
> Note that the app or sqlite is nowhere involved in between #5 and #7
>
>
> I made this note in the bug when I fixed it - I believe it's related, but
> don't have the exact context:
> “From the OSX documentation:
>
> Note that while fsync() will flush all data from the host to the drive
> (i.e. the "permanent storage device"), the drive itself may not physically
> write the data to the platters for quite some time and it may be written in
> an out-of-order sequence. Specifically, if the drive loses power or the OS
> crashes, the application may find that only some or none of their data was
> written.  The disk drive may also re-order the data so that later writes
> may be present, while earlier writes are not.
>
> This is not a theoretical edge case.  This scenario is easily reproduced
> with real world workloads and drive power failures.”
>
> - Deon
>
> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Brendan Duddridge
> Sent: Tuesday, April 18, 2017 1:36 AM
> To: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
> Subject: Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS
>
> Hi Rowan,
>
> My apologies. My sample app does use NSDocument. But my production app
> doesn't. And they both exhibit the same problem.
>
> On Tue, Apr 18, 2017 at 12:29 AM, Rowan Worth <row...@dug.com> wrote:
>
> > Hey Brendan,
> >
> > I'm no OSX expert, but from what I've read this afternoon about
> > NSDocument and friends combined with what I know about sqlite I have
> > to say you are completely mad to continue passing sqlite databases to
> > NSDocument,
> > *especially* as you don't define your own sub-class to do any of the
> > file management.
> >
> > Relying on some NSDocument sub-class builtin to the system and then
> > also opening the DB with CouchbaseLite may well violate section 2.2.1
> > of "how to corrupt an sqlite database" (multiple copies of sqlite
> > linked into the same application). Even if not, it sounds like
> > NSDocument has a tendency to copy files around for eg. auto-save
> purposes.
> >
> > I'd be very *ve

Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Brendan Duddridge
Hi Simon,

I'm using Couchbase Lite actually, not Couchbase server. Couchbase Lite 1.4
to be specific. Couchbase Lite 2.0 is still under development and it does
in fact use SQLite under the hood as does Couchbase Lite 1.4. They had
worked on using ForestDB, but I think they've abandoned that for their
Couchbase Lite 2.0 version.

I've tried calling pragma fullfsync=1 just after the database file was
opened, but I still had the same problem with corruption after a power
failure.

There's a big discussion I've had with the Couchbase Lite developers here
on their Github issues:

https://github.com/couchbase/couchbase-lite-ios/issues/1482

(newer posts are at the bottom)

On Tue, Apr 18, 2017 at 2:52 AM, Simon Slavin  wrote:

>
> On 18 Apr 2017, at 6:01am, Brendan Duddridge  wrote:
>
> > Perhaps the only think I can think of that I'm doing that may be
> unorthodox
> > is by storing the SQLite files inside a macOS package. But at the unix
> > layer that's really just a directory, so I don't know how that could
> cause
> > a problem. Unless macOS treats the package in a way differently than a
> > normal folder and is causing things to get moved around or written when a
> > power failure occurs.
>
> You should be safe storing a SQLite database inside a package.  As you
> write, a package is really just a folder with one extra bit set.
>
> Any demonstration code which uses the SQLite API directly we can attempt
> to debug.
>
> If the problem you need solving is actually with Couchbase, you can solve
> it by upgrading to any version from 2.0 onwards.  Since that no longer uses
> SQLite, SQLite database corruption can no longer be a problem.  Current
> versions of Couchbase no longer use SQLite for a persistence layer and
> Couchbase developers are unlikely to assist with any problem you find with
> such an old version.
>
> If your program demonstrating the problem has access to the SQLite
> database solely via libsqlcrypt.a, can you explain what you’re using it for
> ?  Is your database actually direct access to a Lotus Notes file ?
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Simon Slavin

On 18 Apr 2017, at 9:50am, Deon Brewis  wrote:

> “From the OSX documentation:
> 
> Note that while fsync() will flush all data from the host to the drive (i.e. 
> the "permanent storage device"),

Deon,

I’m not sure this is related, but have you seen





?

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Brendan Duddridge
Hi Richard,

I just did another test with my PowerFailureTest app. I launched it and the
count of rows in it was 4402. Without even running the function to import
my data that writes to the SQLite file, I cut the power (holding down the
power button on my MBP).

When my Mac finally rebooted, I saw that the WAL file was 0 bytes, the SHM
file was 32768 bytes, and the db.sqlite file was 10.5 MB (as it was before
the test).

But now when I launch the test app, These messages display in the console
log:

*02:42:38.630| WARNING: SQLite error (code 11): database corruption at line
60553 of [2ef4f3a5b1] {at errorLogCallback:125}*

*02:42:38.631| WARNING: SQLite error (code 11): statement aborts at 18:
[SELECT revs.doc_id, docid, revid, sequence FROM revs, docs WHERE
docs.doc_id = revs.doc_id AND current=1 AND deleted=0 ORDER BY docid ASC,
revid DESC LIMIT ? OFFSET ?] database disk i {at errorLogCallback:125}*

*2017-04-18 02:42:38.630791-0600 PowerFailureTest[672:9264] Unknown error
calling sqlite3_step (11: database disk image is malformed) rs*

*2017-04-18 02:42:38.632178-0600 PowerFailureTest[672:9264] doc count: 408*

So there's definitely some corruption going on. The file was fine before I
cut the power.



On Tue, Apr 18, 2017 at 2:23 AM, Richard Hipp  wrote:

> On 4/18/17, Brendan Duddridge  wrote:
> > In both cases though I can cause SQLite file corruption by cutting the
> > power on my MacBook Pro.
>
> Corruption-by-power-loss problems tend to be very sensitive to timing.
> If you are able to reliably generate the problem by leisurely cutting
> power (holding down the power button) that suggests something else.
> You might be able to create the same effect by issuing a SIGKILL on
> the process (kill -9) at the same point that you would normally cut
> the power.
>
> I'm thinking you have some problem similar to that which is described
> in section 2.4 of https://www.sqlite.org/howtocorrupt.html
>
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Simon Slavin

On 18 Apr 2017, at 6:01am, Brendan Duddridge  wrote:

> Perhaps the only think I can think of that I'm doing that may be unorthodox
> is by storing the SQLite files inside a macOS package. But at the unix
> layer that's really just a directory, so I don't know how that could cause
> a problem. Unless macOS treats the package in a way differently than a
> normal folder and is causing things to get moved around or written when a
> power failure occurs.

You should be safe storing a SQLite database inside a package.  As you write, a 
package is really just a folder with one extra bit set.

Any demonstration code which uses the SQLite API directly we can attempt to 
debug.

If the problem you need solving is actually with Couchbase, you can solve it by 
upgrading to any version from 2.0 onwards.  Since that no longer uses SQLite, 
SQLite database corruption can no longer be a problem.  Current versions of 
Couchbase no longer use SQLite for a persistence layer and Couchbase developers 
are unlikely to assist with any problem you find with such an old version.  

If your program demonstrating the problem has access to the SQLite database 
solely via libsqlcrypt.a, can you explain what you’re using it for ?  Is your 
database actually direct access to a Lotus Notes file ?

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Deon Brewis
Are you by change using memory mapped IO (MMAP_SIZE something other than 0)?

This does not work on OSX. Not even remotely. I tracked an issue down in 
November 2015, and was able to trivially corrupt a database 100% of the time 
using the repro steps below. This happens long after our app gets shut down and 
SQLITE is flushed.  It got fixed when I set MMAP_SIZE to 0.

Repro (100%):

1) Run our app
2) Shutdown our app
3) Wait for our app to cleanly shut down – nothing showing in Activity Monitor 
– and app.db-wal deleted from disk (i.e. SQLITE clean close)
4) Wait 2 minutes (so our app isn’t running in this 2 minute period at all)

5) Copy the our db file to a NAS
6) Hard reboot the machine (power cycle).
7) Copy the db file to the NAS again (no reopening the app, just copy the file 
back to the NAS after the reboot).

Observe…

The file from #5 still works fine.
The file from #7 is corrupted.

Note that the app or sqlite is nowhere involved in between #5 and #7


I made this note in the bug when I fixed it - I believe it's related, but don't 
have the exact context:
“From the OSX documentation:

Note that while fsync() will flush all data from the host to the drive (i.e. 
the "permanent storage device"), the drive itself may not physically write the 
data to the platters for quite some time and it may be written in an 
out-of-order sequence. Specifically, if the drive loses power or the OS 
crashes, the application may find that only some or none of their data was 
written.  The disk drive may also re-order the data so that later writes may be 
present, while earlier writes are not.

This is not a theoretical edge case.  This scenario is easily reproduced with 
real world workloads and drive power failures.”

- Deon

-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of Brendan Duddridge
Sent: Tuesday, April 18, 2017 1:36 AM
To: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
Subject: Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

Hi Rowan,

My apologies. My sample app does use NSDocument. But my production app doesn't. 
And they both exhibit the same problem.

On Tue, Apr 18, 2017 at 12:29 AM, Rowan Worth <row...@dug.com> wrote:

> Hey Brendan,
>
> I'm no OSX expert, but from what I've read this afternoon about 
> NSDocument and friends combined with what I know about sqlite I have 
> to say you are completely mad to continue passing sqlite databases to 
> NSDocument,
> *especially* as you don't define your own sub-class to do any of the 
> file management.
>
> Relying on some NSDocument sub-class builtin to the system and then 
> also opening the DB with CouchbaseLite may well violate section 2.2.1 
> of "how to corrupt an sqlite database" (multiple copies of sqlite 
> linked into the same application). Even if not, it sounds like 
> NSDocument has a tendency to copy files around for eg. auto-save purposes.
>
> I'd be very *very* surprised if there's an sqlite bug here. I'd take 
> the advice given to you on bountysource and watch your application's 
> file system operations to begin to understand what is going on behind 
> the scenes.
>
> -Rowan
>
>
>
> On 18 April 2017 at 13:01, Brendan Duddridge <brend...@gmail.com> wrote:
>
> > Thanks Richard for your reply.
> >
> > Sorry about the COLLATE problem. That's a Couchbase Lite thing.
> >
> > I find it weird that a different WAL file is getting in there 
> > somehow
> when
> > a power failure occurs. I'm a bit stumped at how it can even write 
> > to the file system the moment the power shuts down so I don't even 
> > understand
> how
> > the database file can get corrupted.  Unless the corruption happens 
> > when the database file is opened up next and tries to use this 
> > improper WAL file.
> >
> > In my main project I'm using SQLCipher as my SQLite layer. But in 
> > the
> demo
> > project I posted, I'm just using the built-in macOS Sierra SQLite
> library.
> > In both cases though I can cause SQLite file corruption by cutting 
> > the power on my MacBook Pro.
> >
> > I'm going to do further investigations to see if there's any misuse 
> > of SQLite that I can find. Although the developers of Couchbase Lite 
> > are far smarter than I and I'm sure they're doing things by the book.
> >
> > Perhaps the only think I can think of that I'm doing that may be
> unorthodox
> > is by storing the SQLite files inside a macOS package. But at the 
> > unix layer that's really just a directory, so I don't know how that 
> > could
> cause
> > a problem. Unless macOS treats the package in a way differently than 
> > a normal folder and is causing things to get moved around

Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Brendan Duddridge
Hi Rowan,

My apologies. My sample app does use NSDocument. But my production app
doesn't. And they both exhibit the same problem.

On Tue, Apr 18, 2017 at 12:29 AM, Rowan Worth  wrote:

> Hey Brendan,
>
> I'm no OSX expert, but from what I've read this afternoon about NSDocument
> and friends combined with what I know about sqlite I have to say you are
> completely mad to continue passing sqlite databases to NSDocument,
> *especially* as you don't define your own sub-class to do any of the file
> management.
>
> Relying on some NSDocument sub-class builtin to the system and then also
> opening the DB with CouchbaseLite may well violate section 2.2.1 of "how to
> corrupt an sqlite database" (multiple copies of sqlite linked into the same
> application). Even if not, it sounds like NSDocument has a tendency to copy
> files around for eg. auto-save purposes.
>
> I'd be very *very* surprised if there's an sqlite bug here. I'd take the
> advice given to you on bountysource and watch your application's file
> system operations to begin to understand what is going on behind the
> scenes.
>
> -Rowan
>
>
>
> On 18 April 2017 at 13:01, Brendan Duddridge  wrote:
>
> > Thanks Richard for your reply.
> >
> > Sorry about the COLLATE problem. That's a Couchbase Lite thing.
> >
> > I find it weird that a different WAL file is getting in there somehow
> when
> > a power failure occurs. I'm a bit stumped at how it can even write to the
> > file system the moment the power shuts down so I don't even understand
> how
> > the database file can get corrupted.  Unless the corruption happens when
> > the database file is opened up next and tries to use this improper WAL
> > file.
> >
> > In my main project I'm using SQLCipher as my SQLite layer. But in the
> demo
> > project I posted, I'm just using the built-in macOS Sierra SQLite
> library.
> > In both cases though I can cause SQLite file corruption by cutting the
> > power on my MacBook Pro.
> >
> > I'm going to do further investigations to see if there's any misuse of
> > SQLite that I can find. Although the developers of Couchbase Lite are far
> > smarter than I and I'm sure they're doing things by the book.
> >
> > Perhaps the only think I can think of that I'm doing that may be
> unorthodox
> > is by storing the SQLite files inside a macOS package. But at the unix
> > layer that's really just a directory, so I don't know how that could
> cause
> > a problem. Unless macOS treats the package in a way differently than a
> > normal folder and is causing things to get moved around or written when a
> > power failure occurs.
> >
> > This problem has been plaguing me for quite a long time actually. I hope
> > that I can find a solution somehow.
> >
> > Thanks,
> >
> > Brendan
> >
> > > I worked around the "COLLATE JSON" problem (by writing my own JSON
> > > collation).  That allows me to analyze your database and WAL file.
> > >
> > > It appears that the WAL file is not the correct WAL file for that
> > > database.  It is as if someone has taken an unrelated WAL file and
> > > renamed it to have the same base name as your database.  Or the other
> > > way around - someone has renamed your database to have the same base
> > > name as the WAL file.
> > >
> > > Your demonstration application does not call SQLite directly.  Instead
> > > it appears to use two libraries that in turn call SQLite:
> > > libsqlcrypt.a and the CouchbaseLite.framework.  You do not provide
> > > sources to these other libraries, so I am unable to deduce what they
> > > are doing.
> > >
> > > So, in the absence of further evidence, I am going to diagnose this as
> > > a misuse of SQLite by one of the two libraries that you are linking -
> > > probably a misuse in the form of trying to rename or unlink or
> > > otherwise modify the database file using ordinary operating system
> > > calls while a connection to the database is open.
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Brendan Duddridge
Hey Rowan,

In my Mac app I'm actually not using NSDocument. I used to and then I
changed my document subclass to extend NSObject instead, just in case
NSDocument was doing bad things behind my back. I had also disabled
auto-save and versions anyway. For a while though I tried extending
NSPersistentDocument even though my app isn't using CoreData. But it also
didn't make a difference.

I'm still storing my SQLite database in a file package even without using
NSDocument. The strange thing though is this is never a problem on iOS
where my code extends UIDocument, but for all intents and purposes behaves
pretty much the same way, at least in terms of organization of the database
files. Only macOS seems to have the corruption problem.

On Tue, Apr 18, 2017 at 12:29 AM, Rowan Worth  wrote:

> Hey Brendan,
>
> I'm no OSX expert, but from what I've read this afternoon about NSDocument
> and friends combined with what I know about sqlite I have to say you are
> completely mad to continue passing sqlite databases to NSDocument,
> *especially* as you don't define your own sub-class to do any of the file
> management.
>
> Relying on some NSDocument sub-class builtin to the system and then also
> opening the DB with CouchbaseLite may well violate section 2.2.1 of "how to
> corrupt an sqlite database" (multiple copies of sqlite linked into the same
> application). Even if not, it sounds like NSDocument has a tendency to copy
> files around for eg. auto-save purposes.
>
> I'd be very *very* surprised if there's an sqlite bug here. I'd take the
> advice given to you on bountysource and watch your application's file
> system operations to begin to understand what is going on behind the
> scenes.
>
> -Rowan
>
>
>
> On 18 April 2017 at 13:01, Brendan Duddridge  wrote:
>
> > Thanks Richard for your reply.
> >
> > Sorry about the COLLATE problem. That's a Couchbase Lite thing.
> >
> > I find it weird that a different WAL file is getting in there somehow
> when
> > a power failure occurs. I'm a bit stumped at how it can even write to the
> > file system the moment the power shuts down so I don't even understand
> how
> > the database file can get corrupted.  Unless the corruption happens when
> > the database file is opened up next and tries to use this improper WAL
> > file.
> >
> > In my main project I'm using SQLCipher as my SQLite layer. But in the
> demo
> > project I posted, I'm just using the built-in macOS Sierra SQLite
> library.
> > In both cases though I can cause SQLite file corruption by cutting the
> > power on my MacBook Pro.
> >
> > I'm going to do further investigations to see if there's any misuse of
> > SQLite that I can find. Although the developers of Couchbase Lite are far
> > smarter than I and I'm sure they're doing things by the book.
> >
> > Perhaps the only think I can think of that I'm doing that may be
> unorthodox
> > is by storing the SQLite files inside a macOS package. But at the unix
> > layer that's really just a directory, so I don't know how that could
> cause
> > a problem. Unless macOS treats the package in a way differently than a
> > normal folder and is causing things to get moved around or written when a
> > power failure occurs.
> >
> > This problem has been plaguing me for quite a long time actually. I hope
> > that I can find a solution somehow.
> >
> > Thanks,
> >
> > Brendan
> >
> > > I worked around the "COLLATE JSON" problem (by writing my own JSON
> > > collation).  That allows me to analyze your database and WAL file.
> > >
> > > It appears that the WAL file is not the correct WAL file for that
> > > database.  It is as if someone has taken an unrelated WAL file and
> > > renamed it to have the same base name as your database.  Or the other
> > > way around - someone has renamed your database to have the same base
> > > name as the WAL file.
> > >
> > > Your demonstration application does not call SQLite directly.  Instead
> > > it appears to use two libraries that in turn call SQLite:
> > > libsqlcrypt.a and the CouchbaseLite.framework.  You do not provide
> > > sources to these other libraries, so I am unable to deduce what they
> > > are doing.
> > >
> > > So, in the absence of further evidence, I am going to diagnose this as
> > > a misuse of SQLite by one of the two libraries that you are linking -
> > > probably a misuse in the form of trying to rename or unlink or
> > > otherwise modify the database file using ordinary operating system
> > > calls while a connection to the database is open.
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users 

Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Richard Hipp
On 4/18/17, Brendan Duddridge  wrote:
> In both cases though I can cause SQLite file corruption by cutting the
> power on my MacBook Pro.

Corruption-by-power-loss problems tend to be very sensitive to timing.
If you are able to reliably generate the problem by leisurely cutting
power (holding down the power button) that suggests something else.
You might be able to create the same effect by issuing a SIGKILL on
the process (kill -9) at the same point that you would normally cut
the power.

I'm thinking you have some problem similar to that which is described
in section 2.4 of https://www.sqlite.org/howtocorrupt.html

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-18 Thread Rowan Worth
Hey Brendan,

I'm no OSX expert, but from what I've read this afternoon about NSDocument
and friends combined with what I know about sqlite I have to say you are
completely mad to continue passing sqlite databases to NSDocument,
*especially* as you don't define your own sub-class to do any of the file
management.

Relying on some NSDocument sub-class builtin to the system and then also
opening the DB with CouchbaseLite may well violate section 2.2.1 of "how to
corrupt an sqlite database" (multiple copies of sqlite linked into the same
application). Even if not, it sounds like NSDocument has a tendency to copy
files around for eg. auto-save purposes.

I'd be very *very* surprised if there's an sqlite bug here. I'd take the
advice given to you on bountysource and watch your application's file
system operations to begin to understand what is going on behind the scenes.

-Rowan



On 18 April 2017 at 13:01, Brendan Duddridge  wrote:

> Thanks Richard for your reply.
>
> Sorry about the COLLATE problem. That's a Couchbase Lite thing.
>
> I find it weird that a different WAL file is getting in there somehow when
> a power failure occurs. I'm a bit stumped at how it can even write to the
> file system the moment the power shuts down so I don't even understand how
> the database file can get corrupted.  Unless the corruption happens when
> the database file is opened up next and tries to use this improper WAL
> file.
>
> In my main project I'm using SQLCipher as my SQLite layer. But in the demo
> project I posted, I'm just using the built-in macOS Sierra SQLite library.
> In both cases though I can cause SQLite file corruption by cutting the
> power on my MacBook Pro.
>
> I'm going to do further investigations to see if there's any misuse of
> SQLite that I can find. Although the developers of Couchbase Lite are far
> smarter than I and I'm sure they're doing things by the book.
>
> Perhaps the only think I can think of that I'm doing that may be unorthodox
> is by storing the SQLite files inside a macOS package. But at the unix
> layer that's really just a directory, so I don't know how that could cause
> a problem. Unless macOS treats the package in a way differently than a
> normal folder and is causing things to get moved around or written when a
> power failure occurs.
>
> This problem has been plaguing me for quite a long time actually. I hope
> that I can find a solution somehow.
>
> Thanks,
>
> Brendan
>
> > I worked around the "COLLATE JSON" problem (by writing my own JSON
> > collation).  That allows me to analyze your database and WAL file.
> >
> > It appears that the WAL file is not the correct WAL file for that
> > database.  It is as if someone has taken an unrelated WAL file and
> > renamed it to have the same base name as your database.  Or the other
> > way around - someone has renamed your database to have the same base
> > name as the WAL file.
> >
> > Your demonstration application does not call SQLite directly.  Instead
> > it appears to use two libraries that in turn call SQLite:
> > libsqlcrypt.a and the CouchbaseLite.framework.  You do not provide
> > sources to these other libraries, so I am unable to deduce what they
> > are doing.
> >
> > So, in the absence of further evidence, I am going to diagnose this as
> > a misuse of SQLite by one of the two libraries that you are linking -
> > probably a misuse in the form of trying to rename or unlink or
> > otherwise modify the database file using ordinary operating system
> > calls while a connection to the database is open.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-17 Thread Brendan Duddridge
Thanks Richard for your reply.

Sorry about the COLLATE problem. That's a Couchbase Lite thing.

I find it weird that a different WAL file is getting in there somehow when
a power failure occurs. I'm a bit stumped at how it can even write to the
file system the moment the power shuts down so I don't even understand how
the database file can get corrupted.  Unless the corruption happens when
the database file is opened up next and tries to use this improper WAL file.

In my main project I'm using SQLCipher as my SQLite layer. But in the demo
project I posted, I'm just using the built-in macOS Sierra SQLite library.
In both cases though I can cause SQLite file corruption by cutting the
power on my MacBook Pro.

I'm going to do further investigations to see if there's any misuse of
SQLite that I can find. Although the developers of Couchbase Lite are far
smarter than I and I'm sure they're doing things by the book.

Perhaps the only think I can think of that I'm doing that may be unorthodox
is by storing the SQLite files inside a macOS package. But at the unix
layer that's really just a directory, so I don't know how that could cause
a problem. Unless macOS treats the package in a way differently than a
normal folder and is causing things to get moved around or written when a
power failure occurs.

This problem has been plaguing me for quite a long time actually. I hope
that I can find a solution somehow.

Thanks,

Brendan

> I worked around the "COLLATE JSON" problem (by writing my own JSON
> collation).  That allows me to analyze your database and WAL file.
>
> It appears that the WAL file is not the correct WAL file for that
> database.  It is as if someone has taken an unrelated WAL file and
> renamed it to have the same base name as your database.  Or the other
> way around - someone has renamed your database to have the same base
> name as the WAL file.
>
> Your demonstration application does not call SQLite directly.  Instead
> it appears to use two libraries that in turn call SQLite:
> libsqlcrypt.a and the CouchbaseLite.framework.  You do not provide
> sources to these other libraries, so I am unable to deduce what they
> are doing.
>
> So, in the absence of further evidence, I am going to diagnose this as
> a misuse of SQLite by one of the two libraries that you are linking -
> probably a misuse in the form of trying to rename or unlink or
> otherwise modify the database file using ordinary operating system
> calls while a connection to the database is open.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-16 Thread Richard Hipp
On 4/16/17, Richard Hipp  wrote:
> I cannot seem to find the implementation for COLLATE JSON anywhere in
> your source code.  Can you give me a hint as to which source file I
> should be looking in?

I worked around the "COLLATE JSON" problem (by writing my own JSON
collation).  That allows me to analyze your database and WAL file.

It appears that the WAL file is not the correct WAL file for that
database.  It is as if someone has taken an unrelated WAL file and
renamed it to have the same base name as your database.  Or the other
way around - someone has renamed your database to have the same base
name as the WAL file.

Your demonstration application does not call SQLite directly.  Instead
it appears to use two libraries that in turn call SQLite:
libsqlcrypt.a and the CouchbaseLite.framework.  You do not provide
sources to these other libraries, so I am unable to deduce what they
are doing.

So, in the absence of further evidence, I am going to diagnose this as
a misuse of SQLite by one of the two libraries that you are linking -
probably a misuse in the form of trying to rename or unlink or
otherwise modify the database file using ordinary operating system
calls while a connection to the database is open.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-16 Thread Richard Hipp
I cannot seem to find the implementation for COLLATE JSON anywhere in
your source code.  Can you give me a hint as to which source file I
should be looking in?

On 4/16/17, Brendan Duddridge  wrote:
> Hi,
>
> I know this is an old thread, but I just found it now when I was doing some
> research on this topic. Thanks Jens for starting this thread. When Jens
> said he knew a developer who could create a corrupted database by turning
> off the power, I'm pretty sure he was talking about me. This has been an
> ongoing problem for me for a while now. My customers keep getting corrupted
> SQLite databases due to power failures or forced shutdowns.
>
> I wrote a small sample app to demonstrate the problem just to make sure it
> wasn't my own app causing the problem in some complex way.
>
> Someone on this thread said to post a sample of a corrupted database. So
> here's a link to download a file that has become corrupted:
>
> https://www.dropbox.com/s/5xwsfhorrfvefjy/corrupted-sqlite-powerfailure-mac.zip?dl=0
>
> The way it started out was an SQLite file with a table in it called docs
> (along with other tables) with 408 rows in it. I imported a bunch of data
> into the SQLite file so that there should be 4402 rows in the docs table,
> along with data stored in other tables.
>
> After the import and the data was committed to the database (while the app
> was still launched but not doing anything), I shutdown my MacBook Pro
> forcibly by holding the power key down until it shut off. When I restarted
> and opened the database file using the Mac SQLiteManager app, The docs
> table now contained only 2631 rows, with a lot of NULL data displayed in
> the docs table. That should not be possible with the library I'm using.
>
> If you're interested in trying out the sample application I wrote to see
> the corruption for yourself, you can download it here:
>
> https://www.dropbox.com/s/q2r4bz7n1d5fgag/PowerFailureTest.zip?dl=0
>
> You'll need a Mac and Xcode 8.x to run it. I would just have only Xcode and
> the sample app running when I caused the Mac's power to be killed.
>
> When the database is opened, it's using WAL journal_mode and also has
> pragma fullfsync=1 enabled.
>
> Any ideas of settings I could try to resolve this problem would be greatly
> appreciated by me and my customers.
>
> Thanks,
>
> Brendan Duddridge
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2017-04-16 Thread Brendan Duddridge
Hi,

I know this is an old thread, but I just found it now when I was doing some
research on this topic. Thanks Jens for starting this thread. When Jens
said he knew a developer who could create a corrupted database by turning
off the power, I'm pretty sure he was talking about me. This has been an
ongoing problem for me for a while now. My customers keep getting corrupted
SQLite databases due to power failures or forced shutdowns.

I wrote a small sample app to demonstrate the problem just to make sure it
wasn't my own app causing the problem in some complex way.

Someone on this thread said to post a sample of a corrupted database. So
here's a link to download a file that has become corrupted:

https://www.dropbox.com/s/5xwsfhorrfvefjy/corrupted-sqlite-powerfailure-mac.zip?dl=0

The way it started out was an SQLite file with a table in it called docs
(along with other tables) with 408 rows in it. I imported a bunch of data
into the SQLite file so that there should be 4402 rows in the docs table,
along with data stored in other tables.

After the import and the data was committed to the database (while the app
was still launched but not doing anything), I shutdown my MacBook Pro
forcibly by holding the power key down until it shut off. When I restarted
and opened the database file using the Mac SQLiteManager app, The docs
table now contained only 2631 rows, with a lot of NULL data displayed in
the docs table. That should not be possible with the library I'm using.

If you're interested in trying out the sample application I wrote to see
the corruption for yourself, you can download it here:

https://www.dropbox.com/s/q2r4bz7n1d5fgag/PowerFailureTest.zip?dl=0

You'll need a Mac and Xcode 8.x to run it. I would just have only Xcode and
the sample app running when I caused the Mac's power to be killed.

When the database is opened, it's using WAL journal_mode and also has
pragma fullfsync=1 enabled.

Any ideas of settings I could try to resolve this problem would be greatly
appreciated by me and my customers.

Thanks,

Brendan Duddridge
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2016-11-16 Thread Richard Hipp
On 11/16/16, Simon Slavin  wrote:
>
> There's also a tool from the SQLite team which can analyze a corrupt SQLite
> file and tell some things about how it is corrupt.  Unfortunately I can't
> remember what it's called or where to find it.  But I heard about it on this
> list and I hope someone can.
>

make showdb

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2016-11-16 Thread Simon Slavin

On 16 Nov 2016, at 7:25am, Jens Alfke  wrote:

> The case I’m concerned about happens on Macs, as I said, and the developer 
> says he can trigger it in his app via a power failure or a forced shutdown 
> (holding the power button for 5 seconds.) This version of our library is one 
> I wrote, in Objective-C; it uses a thin Obj-C wrapper around SQLite, but it 
> doesn’t do anything magic, and although it started out as 3rd party code I’ve 
> tweaked it enough that I know it inside and out.

Ah, that's enough information that someone around here should be able to help.  
Can you use the shell tool to run

PRAGMA integrity_check

on one such corrupt file and post what it finds ?  If the output is long, 
posting a summary is fine.

There's also a tool from the SQLite team which can analyze a corrupt SQLite 
file and tell some things about how it is corrupt.  Unfortunately I can't 
remember what it's called or where to find it.  But I heard about it on this 
list and I hope someone can.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2016-11-15 Thread Jens Alfke

> On Nov 15, 2016, at 1:46 PM, Simon Slavin  wrote:
> 
> Do these people admit they're letting their phones run out of power ?  

There’s nothing wrong with letting your phone run out of power, and software 
should be resilient to it. I don’t think that’s the problem, though, because 
mobile OS’s perform a clean shutdown when power drops below a minimum 
threshold, so it’s not like pulling the plug on a desktop computer (or holding 
down the power button on a laptop because the kernel froze.)

> If not, I suspect some sort of programming error somewhere.  Not necessarily 
> in your own programming but sometimes in whatever database API you're using 
> which eventually results in calls to the SQLite API.
> So is your software in C or C++ and calling the SQLite API directly ?

It’s possible. Our Android library is written in Java and uses some Java 
wrapper library around SQLite. I don’t know the details myself.

The case I’m concerned about happens on Macs, as I said, and the developer says 
he can trigger it in his app via a power failure or a forced shutdown (holding 
the power button for 5 seconds.) This version of our library is one I wrote, in 
Objective-C; it uses a thin Obj-C wrapper around SQLite, but it doesn’t do 
anything magic, and although it started out as 3rd party code I’ve tweaked it 
enough that I know it inside and out.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2016-11-15 Thread R Smith


On 2016/11/15 10:34 PM, Jens Alfke wrote:



On Nov 15, 2016, at 10:57 AM, Simon Slavin  wrote:

sqlite> PRAGMA checkpoint_fullfsync;
1

I wasn’t aware of that pragma. Just tried it on my Mac (10.12.1), and its value 
is 1 even if I don’t first set pragma fullfsync; i.e. it defaults to 1. 
(Contradicting the docs.)

So it appears that on macOS, SQLite does use F_FULLFSYNC when checkpointing, 
but not at other times that it fsyncs. What does that mean in actual use, 
assuming that I always use WAL mode? Is there still an opportunity for 
corruption in the face of power failures?

(Sorry to be frothing at the mouth about this; but my team’s dealing with a few 
users/customers whose apps encounter db corruption, on Android as well as 
macOS, and we’re getting really frustrated trying to figure out what’s going 
on.)


Quite OK to be unsettled by learning that a flaw in the system that you 
assumed did not exist, might be the cause of your troubles. I think 
however something is missing in the complete understanding, so to be clear:


Calling F_FULLFSYNC when checkpointing or otherwise invokes a contract 
between the running software (your system) and the Operating System 
whereby the Operating system promises to A - put the current buffer's 
worth of written data INTO the BUS feeding the writable media, AND B - 
then asking said media to confirm the writing has happened (committed)  
BEFORE handing back control (moving your thread pointer along). This is 
not exactly the same for all OSes, but more or less similar.


This can slow down things, but sometimes the security is worth the price 
paid in latency. However, Hard drive manufacturers tend to sometimes lie 
about having committed your data. It is a well known and almost 
universally used method in standard desktop / laptop computers for the 
harddrive to tell the OS that: "YES indeed, I have committed" when in 
fact it is still piping data into the platters. Yes SSD's are better at 
this by simple virtue of lower latency from buffer-to-silicone, but they 
are not above lying either.


This means that unless you have a SERVER quality drive with typically 
its own battery-backup that guarantees ANY buffered writes to reach the 
platters, there simply is zero guarantee that all writes WILL go to 
disk, and any normal system that guarantees it lies.


This does not mean however that you should be experiencing corruption. 
SQLite might not be able to guarantee all writes reaching the disk, but 
in most cases, if a final write did not happen, the usual last step in 
committing a transaction is deleting / truncating a journal file or 
writing a checkpoint marker or such, which, if it did not happen, should 
have the entire write roll back (next time you open the DB) and leave 
you in a non-corrupt state. IF this does not happen it means a write may 
have happened out of order (not very common but can happen) or some 
other worse problem occurred - most importantly, FULL_FSYNC isn't the 
wild goose to be chasing. Whether or not any write happened is never an 
acceptable cause of corruption, so trying to wrestle with the thing that 
promises to make writes happen "more" as a causal relation to a 
corruption problem, is simply moot. (This is vigorously tested with 
every release of SQLite too).


If you can get the DB files (journals and all) from such a system where 
a user claims to be able to reproduce the corruption reliably, that 
would be an easy thing to check and the Devs here might learn something 
from it. You can simply make something that copy all the DB files before 
opening them at startup, until you have produced a corrupt DB, then 
those last copied files will be the corrupted DB files that can be 
investigated.


Good luck!
Ryan

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2016-11-15 Thread Jean-Christophe Deschamps


At 22:41 15/11/2016, you wrote:

So if you're truly worried about flush-to-disk what do you do 
?  Solution 1 is to buy hard disks rated for servers -- sometimes 
called "enterprise-class hard drives" -- and to set the DIP switches 
to tell them they're being used on a server.  Those things are 
intended for use in servers, and ACID can be important to a server, so 
they support it properly and do not lie.


It's even both best and worse than that.

I enjoy running an old and ugly diy machine with 8 SAS HP disks (15k 
rpm) arranged in RAID6 behind a serious hardware controler having a 
good share of battery-backed RAM. Those enterprise-class disks don't 
have any switch and will lie to you and the OS as much as they can if 
you let them do and don't ask for acknowledgement that each write is 
final, but the key piece is the Areca controler which actually manages 
the RAID array and knows when/if some write has actually landed on 
surface. Should the power supply vanish, it keeps a track of what still 
needs to be written and will silently update the disks when power comes 
back, even before the OS is loaded.


So no, even very expensive hard disks themselves don't take any step to 
avoid lying to you/OS if you/OS don't care over "details", but the 
controler (if a serious one) will fill the gap and insure that 
everything is fine.


To be honest I'm almost certain that there can exist extreme situations 
where, for instance, the RAID array is in tatters (e.g. more than 2 of 
6 disks simultaneously failing) and won't accept writes while the 
controler battery is going dangerously low.
But if your needs are thusly mission critical you probably can afford 
enough redundant sophisticated hardware to make the probability of a 
disaster (e.g. a one-bit flip) as low as required.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2016-11-15 Thread Bob Friesenhahn

On Tue, 15 Nov 2016, Simon Slavin wrote:


Modern storage subsystems (hard disk or SSD) intended for use in a 
normal user computer always lie to the OS about flushing to disk. 
The apparent increase in speed from doing this is so big that every 
manufacturer has to do it, or risk having every review harp on about 
how slow their computer performs in real life tasks.  There is no 
way to get these things to be honest.  Because not one person in ten 
thousand cares.  Most people are more interested in how fast they 
can launch Microsoft Word.


I don't think that things are as bad as you say.  Some modern 
filesystems (e.g. zfs) depend on hardware cache flush to work yet 
there has not been a rash of corrupted filesystems.  Many people use 
these filesystems on non-enterprise hardware.


There are some devices which do fail to flush their cache or write 
data properly.  Some SSDs likely re-write data while in use or idle 
due to wear leveling in a way which causes a possibility of loss.


Enterprise disks are more tolerant of vibration, are specified to 
have fewer uncorrected bit errors, and try for a bounded time to 
recover bad sectors.


MacOS's target market is not storage.  The useful mass storage 
offerings for hardware running MacOS is rather limited.


Operating systems where fsync() or fdatasync() do not truely commit 
data to hardware are broken.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2016-11-15 Thread Simon Slavin

On 15 Nov 2016, at 8:34pm, Jens Alfke  wrote:

> (Sorry to be frothing at the mouth about this; but my team’s dealing with a 
> few users/customers whose apps encounter db corruption, on Android as well as 
> macOS, and we’re getting really frustrated trying to figure out what’s going 
> on.)

Do these people admit they're letting their phones run out of power ?  If not, 
I suspect some sort of programming error somewhere.  Not necessarily in your 
own programming but sometimes in whatever database API you're using which 
eventually results in calls to the SQLite API.

So is your software in C or C++ and calling the SQLite API directly ?

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2016-11-15 Thread Simon Slavin

On 15 Nov 2016, at 8:18pm, Jens Alfke  wrote:

> The only way to guarantee a true barrier is to really-and-truly flush the 
> disk controller, which requires not simply flushing but resetting it. That’s 
> what F_FULLFSYNC on macOS does. (Unfortunately it makes the disk controller 
> unresponsive for a macroscopic amount of time while it resets, which is why 
> it’s slow. Worse, it adds latency to _all_ I/O on the system. Flush too often 
> on a desktop computer and your music player will glitch, video playback will 
> drop frames, etc.)

It's worse than that.

Modern storage subsystems (hard disk or SSD) intended for use in a normal user 
computer always lie to the OS about flushing to disk.  The apparent increase in 
speed from doing this is so big that every manufacturer has to do it, or risk 
having every review harp on about how slow their computer performs in real life 
tasks.  There is no way to get these things to be honest.  Because not one 
person in ten thousand cares.  Most people are more interested in how fast they 
can launch Microsoft Word.

So if you're truly worried about flush-to-disk what do you do ?  Solution 1 is 
to buy hard disks rated for servers -- sometimes called "enterprise-class hard 
drives" -- and to set the DIP switches to tell them they're being used on a 
server.  Those things are intended for use in servers, and ACID can be 
important to a server, so they support it properly and do not lie.

Solution 2 works on any hard drive.  To make it flush cache just don't do 
anything that involves disk access for a couple of seconds.  They note the 
inactivity, then decide they have time to write away cache.

But if all the contact you have with the hard disk is through the OS ?  Forget 
it.  If power to the computer is but you're probably depending on the amount of 
power left in the system to flush data to disk.  And given the tiny power 
consumption those things have these days most of the time that's enough.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2016-11-15 Thread Simon Slavin

On 15 Nov 2016, at 8:34pm, Jens Alfke  wrote:

> On Nov 15, 2016, at 10:57 AM, Simon Slavin  wrote:
> 
>> sqlite> PRAGMA checkpoint_fullfsync;
>> 1
> 
> I wasn’t aware of that pragma. Just tried it on my Mac (10.12.1), and its 
> value is 1 even if I don’t first set pragma fullfsync; i.e. it defaults to 1. 
> (Contradicting the docs.)
> 
> So it appears that on macOS, SQLite does use F_FULLFSYNC when checkpointing, 
> but not at other times that it fsyncs.

Yet the sequence I posted was from my own Macintosh, running 10.12.[something I 
can't talk about].  And as you see from my post the pragma starts off as 0 but 
if you set it to 1 it'll stay at 1.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2016-11-15 Thread Jens Alfke

> On Nov 15, 2016, at 10:57 AM, Simon Slavin  wrote:
> 
> sqlite> PRAGMA checkpoint_fullfsync;
> 1

I wasn’t aware of that pragma. Just tried it on my Mac (10.12.1), and its value 
is 1 even if I don’t first set pragma fullfsync; i.e. it defaults to 1. 
(Contradicting the docs.)

So it appears that on macOS, SQLite does use F_FULLFSYNC when checkpointing, 
but not at other times that it fsyncs. What does that mean in actual use, 
assuming that I always use WAL mode? Is there still an opportunity for 
corruption in the face of power failures?

(Sorry to be frothing at the mouth about this; but my team’s dealing with a few 
users/customers whose apps encounter db corruption, on Android as well as 
macOS, and we’re getting really frustrated trying to figure out what’s going 
on.)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2016-11-15 Thread Jens Alfke

> On Nov 15, 2016, at 10:57 AM, Simon Slavin  wrote:
> 
> My understanding is that F_FULLFSYNC still works the way you describe on a 
> Mac and SQLite still uses it the way the documentation says.  But I'm not in 
> touch with either development group.


This seems like a strange decision, considering that in other respects SQLite 
(wisely) defaults to settings that favor safety and durability over 
performance, e.g. synchronous mode.

So perhaps the “How To Corrupt A SQLite Database” page should be updated to add 
“Use SQLite on a Mac without knowing the secret that you should enable `pragma 
fullfsync`?” :-(


Incidentally, that page (https://www.sqlite.org/howtocorrupt.html 
) says:

>> Actually, if one is only concerned with atomic and consistent writes and is 
>> willing to forego durable writes, the sync operation does not need to wait 
>> until the content is completely stored on persistent media. Instead, the 
>> sync operation can be thought of as an I/O barrier. As long as all writes 
>> that occur before the sync are completed before any write that happens after 
>> the sync, no database corruption will occur.

So … based on my understanding (what I was told by a filesystem guru, albeit 10 
years ago), a typical fsync does _not_ act as an I/O barrier. This is because 
it’s not really syncing (as the same page says, "most consumer-grade mass 
storage devices lie about syncing”). Disk controllers don’t write physical 
sectors in the order in which the writes are received; instead they buffer 
them, and then optimize the order of writes to minimize seeks. So if power is 
abruptly lost, it is not true that there’s a barrier before which all writes 
have been persisted and after which none have; instead the pattern is likely to 
be random. 

The only way to guarantee a true barrier is to really-and-truly flush the disk 
controller, which requires not simply flushing but resetting it. That’s what 
F_FULLFSYNC on macOS does. (Unfortunately it makes the disk controller 
unresponsive for a macroscopic amount of time while it resets, which is why 
it’s slow. Worse, it adds latency to _all_ I/O on the system. Flush too often 
on a desktop computer and your music player will glitch, video playback will 
drop frames, etc.)

I’m totally willing to believe that these behaviors are different with SSDs, 
especially ones that use newer types of interfaces or are soldered directly to 
the motherboard (as on some MacBooks.) I’d love to be educated about the 
current state of the art :)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

2016-11-15 Thread Simon Slavin

On 15 Nov 2016, at 6:11pm, Jens Alfke  wrote:

> I verified in the built-in sqlite3 tool on macOS 10.12.1 that the result of 
> `pragma fullfsync` is 0.

The default setting is 0.  But you can change it.  On my Mac running 10.12.x,

SQLite version 3.14.0 2016-07-26 15:17:14
Enter ".help" for usage hints.
sqlite> PRAGMA fullfsync;
0
sqlite> PRAGMA fullfsync=1;
sqlite> PRAGMA fullfsync;
1
sqlite> PRAGMA checkpoint_fullfsync;
1
sqlite> 

My understanding is that F_FULLFSYNC still works the way you describe on a Mac 
and SQLite still uses it the way the documentation says.  But I'm not in touch 
with either development group.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Database corruption, and PRAGMA fullfsync on macOS

2016-11-15 Thread Jens Alfke
I’m seeing conflicting information about SQLite’s use of F_FULLFSYNC on macOS 
when committing transactions. This is making me nervous about durability and 
the possibility of database corruption.

The SQLite docs for PRAGMA fullfsync 
(https://www.sqlite.org/pragma.html#pragma_fullfsync 
) say:
>  This flag determines whether or not the F_FULLFSYNC syncing method is used 
> on systems that support it. The default value of the fullfsync flag is off. 
> Only Mac OS X supports F_FULLFSYNC.

I verified in the built-in sqlite3 tool on macOS 10.12.1 that the result of 
`pragma fullfsync` is 0.

My understanding is that issuing a F_FULLFSYNC ioctl call is important for 
ensuring that all writes have been flushed to physical storage, since disk 
controllers may lie and ignore a regular sync instruction, leaving some writes 
in their volatile cache memory. (I was working at Apple, on a project using 
SQLite, during the development of OS X 10.4 circa 2006; there were a number of 
databases that got corrupted by kernel panics or forced power-off, until Apple 
had the F_FULLFSYNC call added to SQLite.)

Apple’s documentation for Core Data (which uses SQLite) says: 
(https://developer.apple.com/library/content/documentation/Cocoa/Conceptual/CoreData/PersistentStoreFeatures.html
 
)
> In OS X the fsync command does not guarantee that bytes are written, so 
> SQLite sends a F_FULLFSYNC request to the kernel to ensure that the bytes are 
> actually written through to the drive platter. This request causes the kernel 
> to flush all buffers to the drives and causes the drives to flush their track 
> caches. Without this, there is a significantly large window of time within 
> which data will reside in volatile memory. If system failure occurs you risk 
> data corruption.


This contradicts the SQLite docs, but it does match my understanding, and until 
today I thought that SQLite on macOS enabled fullfsync by default.

So which of these is true?
• F_FULLFSYNC is no longer necessary for full durability (due to some change in 
the way SQLite commits?)
or
• SQLite by default sacrifices durability and data integrity for performance by 
skipping the [slow] F_FULLFSYNC call

This is disturbing because we do have one developer using our framework who has 
been reporting occasional database corruption. He says it’s happened to users, 
and he claims to be able to make it happen by forcibly powering-down his laptop 
(though we haven’t been able to reproduce it.)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users