Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode
On Tue, Aug 08, 2017 at 10:30:33AM -0700, Jens Alfke wrote: > > On Aug 4, 2017, at 11:28 AM, Nico Williams wrote: > > Imagine a mode where there is only a WAL, and to checkpoint is to write > > a new WAL with only live contents and... rename(2) into place. > > What you’re describing is exactly how CouchDB’s storage engine works, > as well as descendants like Couchbase Server’s CouchStore and > ForestDB. (Note: I work for Couchbase.) Also how ZFS works. > Efficient lookups in a file like this require the existence of a bunch > of extraneous metadata like interior B-tree nodes. This metadata > changes all the time as records are written*, so a lot of it has to be > written out too along with every transaction, resulting in substantial > write amplification. Yes. It's called write magnification. ZFS deals with this by first committing to an intent log (ZIL) without writing all those interior nodes, then it eventually writes a proper transaction. One could do the same sort of thing for a single-file DB: write a number of transactions as intents, then once in a while "checkpoint" them by writing a b-tree transaction that includes all those updates. For readers this means always processing all intent log entries since the last b-tree-updating transaction. LSMs are... a lot lik this, IIUC. Are they not? > The other big drawback is that compaction (the checkpoint step you > describe) is very expensive in terms of I/O. I’ve known of CouchDB > systems that took many hours to compact their databases, and since > every write that occurs during a compaction has to be replayed onto > the new file after the copy before compaction completes, one can get > into a state where a busy database either never actually finishes > compacting, or has to temporarily block all writers just so it can get > the damn job done without interruption. (It’s a similar problem to GC > thrash.) There's definitely trade-offs. This is already the case for SQLite3's WAL. You have to checkpoint often, but when you do you lose read concurrency. When you value read concurrency highly you might tune the WAL max size up to reduce checkpoint frequence, and now you slow down checkpointing. Another thing that can be done is to write to the "next" DB as soon as possible, possibly synchronously with writes to the "current" DB. Then the checkpoint process is very simple: write the "close" TX to the "current" DB, rename the "next" into place, create the next "next". > We’ve also seen that, on low-end hardware like mobile devices, I/O > bandwidth is limited enough that a running compaction can really harm > the responsiveness of the _entire OS_, as well as cause significant > battery drain. Yes. But there you don't really need read concurrency, so SQLite3 without a WAL (or with a WALL but frequent checkpointing) will suffice. > * Modifying/rewriting a single record requires rewriting the leaf node > that points to it, which requires rewriting the parent node that > points to the leaf, and this ripples all the way up to the root node. I'm well aware :) Others on the list might not, naturally. One of the very nice features of an append-only single-file DB file format is that you can just "tail -f" (well, a binary tail -f) to replicate. If you add in the intent log concept, it's not so bad in terms of I/O, but you slow down readers somewhat. Another very nice feature of an append-only single-file DB file is high read concurrency. And yet another is fast recovery (throw away a truncated transaction and "checkpoint"). Nico -- ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode
On 8 Aug 2017, at 6:30pm, Jens Alfke wrote: > We’ve also seen that, on low-end hardware like mobile devices, I/O bandwidth > is limited enough that a running compaction can really harm the > responsiveness of the _entire OS_, as well as cause significant battery drain. Yes. Cannot stress enough that you don’t need VACUUM for efficient storage. It locks the entire database, it monopolises access to storage, and it does many writes to storage which means it’ll wear through Flash storage cycles. What used to be the great advantage of VACUUM — defragmentation — was useful for hard disks but does not give any advantage for Flash storage. The only advantages are to save filespace immediately after deleting data or indexes. You might want VACUUM if you delete data, then take a copy of your database for backup. But most SQLite databases are small enough that this doesn’t matter. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode
> On Aug 4, 2017, at 11:28 AM, Nico Williams wrote: > > Imagine a mode where there is only a WAL, and to checkpoint is to write > a new WAL with only live contents and... rename(2) into place. What you’re describing is exactly how CouchDB’s storage engine works, as well as descendants like Couchbase Server’s CouchStore and ForestDB. (Note: I work for Couchbase.) Efficient lookups in a file like this require the existence of a bunch of extraneous metadata like interior B-tree nodes. This metadata changes all the time as records are written*, so a lot of it has to be written out too along with every transaction, resulting in substantial write amplification. The other big drawback is that compaction (the checkpoint step you describe) is very expensive in terms of I/O. I’ve known of CouchDB systems that took many hours to compact their databases, and since every write that occurs during a compaction has to be replayed onto the new file after the copy before compaction completes, one can get into a state where a busy database either never actually finishes compacting, or has to temporarily block all writers just so it can get the damn job done without interruption. (It’s a similar problem to GC thrash.) We’ve also seen that, on low-end hardware like mobile devices, I/O bandwidth is limited enough that a running compaction can really harm the responsiveness of the _entire OS_, as well as cause significant battery drain. —Jens * Modifying/rewriting a single record requires rewriting the leaf node that points to it, which requires rewriting the parent node that points to the leaf, and this ripples all the way up to the root node. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode
On Fri, Aug 04, 2017 at 07:05:06PM +, Peter Da Silva wrote: > Step 2 seems rather expensive, even if you’re filtering out dead blocks in > the process. It's no more expensive than WAL checkpointing is today. You could always do what LMDB does to reuse free blocks in a DB and avoid having to checkpoint-and-rename-into-place, but that greatly complicates locking because readers have to lock the transaction that they are reading at. Nico -- ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode
On Fri, Aug 04, 2017 at 01:59:07PM -0500, Nico Williams wrote: > The checkpoint process would look like this: > > - make a new file in the same directory > - copy the DB to the new file The copy would basically be copying all the live data as a single transaction on the new DB/WAL file. At the end there should be at most two transactions in the file: one for the schema, one for the data. > - rename the new file into place > - write the "closed, renamed" marker into the old file (which is still >open) > > This works on POSIX, and if you use the right CreateFileEx() options, it > works on WIN32. I'm referring here to readers that have the first file open... continuing to be able to read from it as long as they retain the open file handle, even after that file is deleted by the rename. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode
On 8/4/17, 1:59 PM, "sqlite-users on behalf of Nico Williams" wrote: > The checkpoint process would look like this: > - make a new file in the same directory > - copy the DB to the new file > - rename the new file into place > - write the "closed, renamed" marker into the old file (which is still open) Step 2 seems rather expensive, even if you’re filtering out dead blocks in the process. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode
On Fri, Aug 04, 2017 at 06:49:42PM +, Peter Da Silva wrote: > On 8/4/17, 1:45 PM, "sqlite-users on behalf of Nico Williams" > n...@cryptonector.com> wrote: > > SQLite3's WAL is already log-structured. The main DB file isn't. > > So SQLite3 is a hybrid. But it doesn't have to be a hybrid. > > One issue I see with this is you’ll have to retain the old WALs as > long as they have any live data, or the checkpoint operation will have > to copy all the unmodified data in the log to the new WAL, or you’ll > have to keep a non-log structure containing all the relatively static > data that hasn’t been modified in the last “N” checkpoints. The checkpoint process would look like this: - make a new file in the same directory - copy the DB to the new file - rename the new file into place - write the "closed, renamed" marker into the old file (which is still open) This works on POSIX, and if you use the right CreateFileEx() options, it works on WIN32. Nico -- ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode
On 8/4/17, 1:45 PM, "sqlite-users on behalf of Nico Williams" wrote: > SQLite3's WAL is already log-structured. The main DB file isn't. So SQLite3 > is a hybrid. But it doesn't have to be a hybrid. One issue I see with this is you’ll have to retain the old WALs as long as they have any live data, or the checkpoint operation will have to copy all the unmodified data in the log to the new WAL, or you’ll have to keep a non-log structure containing all the relatively static data that hasn’t been modified in the last “N” checkpoints. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode
On Fri, Aug 04, 2017 at 06:43:08PM +, Peter Da Silva wrote: > On 8/4/17, 1:28 PM, "sqlite-users on behalf of Nico Williams" wrote: > > [...] > > A log-structured database, like a log-structured file system? Yes. Filesystems and databases are each other's dual. SQLite3's WAL is already log-structured. The main DB file isn't. So SQLite3 is a hybrid. But it doesn't have to be a hybrid. Nico -- ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode
On 8/4/17, 1:28 PM, "sqlite-users on behalf of Nico Williams" wrote: > Imagine a mode where there is only a WAL, and to checkpoint is to write a new > WAL with only live contents and... rename(2) into place. Such a mode would > a) be a 100% Copy-on-Write (CoW) mode, whereas currently WAL is only CoW > until a checkpoint operation comes along, b) have better read concurrency. A > special marker could be used to denote "this WAL is closed and replaced with > a checkpointed one", that way readers only have to stat/re-open when they see > this. A log-structured database, like a log-structured file system? ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode
On Fri, Aug 04, 2017 at 11:53:05AM -0500, Nico Williams wrote: > WAL mode still ends up having no read concurrency when it's time to > checkpoint the WAL. The same would happen with this concept. I don't > think this works well. Speaking of which... and I know I've mentioned this before and I risk being a broken record... Imagine a mode where there is only a WAL, and to checkpoint is to write a new WAL with only live contents and... rename(2) into place. Such a mode would a) be a 100% Copy-on-Write (CoW) mode, whereas currently WAL is only CoW until a checkpoint operation comes along, b) have better read concurrency. A special marker could be used to denote "this WAL is closed and replaced with a checkpointed one", that way readers only have to stat/re-open when they see this. It probably wouldn't take much to write such a thing. One nice thing about this is that block-level replication would be easy and fast because all writes would be append-writes: just send missing blocks until the file ID (st_dev, st_ino, and inode gen number) changes, then send the whole new file. (Not that block-level replication is an end-all-be-all. But that for some use cases it's very nice. Logical replication is better for other use-cases.) Nico -- ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode
On Fri, Aug 04, 2017 at 08:50:05AM +0200, Luc DAVID wrote: > sqlite has WAL mode for better concurrency and this could maybe be used to > extend the number of writters: > > Do you think it would be possible to create a > MyDb.WAL001...MyDb.WAL.002...MyDb.WAL.nnn when a write operation is > currently running in order to allow more writers? > > The sqlite engine would then take care of dealing with all the WAL files > when reading data, backup...etc WAL mode still ends up having no read concurrency when it's time to checkpoint the WAL. The same would happen with this concept. I don't think this works well. Other approaches will work better. One would be to have a server process, with the SQLite3 API transparently using IPC to talk to it. Or else a single-process DB (connections from only one process allowed at any time) where internal thread synchronization could be used to manage locks (on rows/pages and a log used to ultimately put together a final transaction in the background so that other threads can keep making progress). Doing this sort of thing with multiple processes gets tricky fast, requiring shared memory, mutexes on shared memory, ... IPC to a server process makes everything much simpler, but there is an IPC performance penalty. The server process model works very well for PostgreSQL though... Nico -- ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode
On Fri, Aug 04, 2017 at 12:58:43PM +0100, Simon Slavin wrote: > The problem you’re trying to fix is one of the big problems with > distributed databases. Nobody has found a good solution for it yet. It's impossible to solve for the eventually-consistent types. You just have to explicitly handle... eventual conflicts... eventually. For the rest, higher write concurrency is not the end of the world, but you have to be aware of it and you have to be careful. The RDBMS can only do so much to protect you automatically -- the more it does, the worse it will perform, but the less it does the more you have to be aware and code defensively. Nico -- ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode
On Fri, Aug 04, 2017 at 09:33:31AM +0200, Eric Grange wrote: > The main problem with multiple writers would be in conflict resolution, > locking and deadlocks. There is plenty in the literature about this. You have to code more defensively, you may need things like "FOR UPDATE", etc. > Imagine a simple accounting transactions "update accounts set value = value > + 1 where ..." if run at the same time from 2 threads (or close enough), > then if you do not have conflict resolution in place you may end up > increase value by only 1 I write that like so: SELECT value FROM accounts WHERE ...; -- store this in a variable UPDATE accounts SET value = value + 1 WHERE ... AND value = :value_read_earlier; This is how one is supposed to get atomicity in LDAP too, incidentally. So, if you were creating a SEQUENCE kind of thing (see the "sequencer" thread this week too) then I'd return 1 + value_read_earlier. This approach works well and is portable! But it does require that you think about concurrency :( > 1) alice starts transaction > 2) bob starts transaction > 3) alice reads value, sees 10 > 4) bob reads value, sees 10 > 5) alice writes 10+1 > 6) bob writes 10+1 ooops Yeah, "don't do that"; see above. > And once you have more than one query, things become even more complicated. Things are already this complicated if you use PostgreSQL, SQL Server, Oracle, ... There's nothing new here, except that plenty of code wirtten with SQLite3 in mind may need to be modified to be safe with higher write concurrency enabled (so it probably shouldn't be enabled by default). > Even among full relational DBs, few manage these correctly. IME only > PostgreSQL and Firebird handle these correctly by default, for Oracle or > MSSQL you have to use special locking modes and transaction options with > significant performance penalties. Yes, you have to be aware of varying synchronization/locking features (e.g., "FOR UPDATE") and write concurrency semantics. Nico -- ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode
> Le 4 août 2017 à 14:15, Richard Hipp a écrit : > > Another alternative is the newer server-process-edition branch > (https://sqlite.org/src/timeline?n=all&r=server-process-edition) which > you can read about here: > https://sqlite.org/src/artifact/0c6bc6f55191b690 Looks certainly promising! -- Best Regards, Meilleures salutations, Met vriendelijke groeten, Olivier Mascia, http://integral.software ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode
On 8/4/17, Luc DAVID wrote: > Hello, > > I was thinking about a possible solution for sqlite "only single writer > is allowed at the same time" and database lock. > > sqlite has WAL mode for better concurrency and this could maybe be used > to extend the number of writters: The begin-concurrent branch (https://sqlite.org/src/timeline?r=begin-concurrent&n=all) allows you to say: BEGIN CONCURRENT; -- various database reads and updates COMMIT; And to do that simultaneously in two or more database connections, and have them all work. Except, the concurrent transactions may not overlap. That is to say, content written by one may not be read or written by another. If the transactions do overlap, the second one to try to COMMIT will get an SQLITE_BUSY_SNAPSHOT error and will be forced to abandon its transaction and start over. The begin-concurrent branch is in production use in high-stress environments. We have not merged that branch to trunk (yet) because it currently imposes extra overhead on all applications, even applications that do not use BEGIN CONCURRENT. Another alternative is the newer server-process-edition branch (https://sqlite.org/src/timeline?n=all&r=server-process-edition) which you can read about here: https://sqlite.org/src/artifact/0c6bc6f55191b690 -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode
On 4 Aug 2017, at 11:43am, Luc DAVID wrote: > sqlite was not designed for this kind of access but It would be great to have > a higher level of concurrency The problem with these things is that you have SQLite trying to read the minds of he programmer and user. Consider two operations being done by different computers at the same time: UPDATE contacts SET phone = REPLACE (phone, '444', '555') INSERT INTO contacts VALUES ('Charles', '444 1234') The first one is done because an entire phone exchange got renumbered to make way for a new exchange. The second is a new contact. But the two operations were in overlapping transactions. When they’re both finished should the new contact have '444' or '555' ? Here’s another scenario. Suppose you have an invoice file so you can invoice those contacts, and the key of the invoice file is the rowid of the contact file. Two users each want to create an invoice for a new contact at the same time. The software running on their computers needs to insert the new contact, then find the rowid of the new contact so it can create an invoice for it. How can you arrange this when the transactions can overlap ? The problem you’re trying to fix is one of the big problems with distributed databases. Nobody has found a good solution for it yet. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode
Yes this would be the case if no one closes the write transactions before reading. Would a possible solution be to use "read uncommited" or to include a kind of timestamp or autoInc identifier used internally by sqlite engine ? even if I am not sure it would be enough to avoid conflicts... sqlite was not designed for this kind of access but It would be great to have a higher level of concurrency Luc Le 04/08/2017 à 09:33, Eric Grange a écrit : The main problem with multiple writers would be in conflict resolution, locking and deadlocks. Imagine a simple accounting transactions "update accounts set value = value + 1 where ..." if run at the same time from 2 threads (or close enough), then if you do not have conflict resolution in place you may end up increase value by only 1 1) alice starts transaction 2) bob starts transaction 3) alice reads value, sees 10 4) bob reads value, sees 10 5) alice writes 10+1 6) bob writes 10+1 ooops And once you have more than one query, things become even more complicated. Even among full relational DBs, few manage these correctly. IME only PostgreSQL and Firebird handle these correctly by default, for Oracle or MSSQL you have to use special locking modes and transaction options with significant performance penalties. Eric On Fri, Aug 4, 2017 at 8:50 AM, Luc DAVID wrote: Hello, I was thinking about a possible solution for sqlite "only single writer is allowed at the same time" and database lock. sqlite has WAL mode for better concurrency and this could maybe be used to extend the number of writters: Do you think it would be possible to create a MyDb.WAL001...MyDb.WAL.002...MyDb.WAL.nnn when a write operation is currently running in order to allow more writers? The sqlite engine would then take care of dealing with all the WAL files when reading data, backup...etc The maximum of allowed writers could be set by a pragma or another mean (when opening the db) It seems a simply way to boost sqlite concurrency. Am I wrong on this point ? Best Regards Luc ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode
The main problem with multiple writers would be in conflict resolution, locking and deadlocks. Imagine a simple accounting transactions "update accounts set value = value + 1 where ..." if run at the same time from 2 threads (or close enough), then if you do not have conflict resolution in place you may end up increase value by only 1 1) alice starts transaction 2) bob starts transaction 3) alice reads value, sees 10 4) bob reads value, sees 10 5) alice writes 10+1 6) bob writes 10+1 ooops And once you have more than one query, things become even more complicated. Even among full relational DBs, few manage these correctly. IME only PostgreSQL and Firebird handle these correctly by default, for Oracle or MSSQL you have to use special locking modes and transaction options with significant performance penalties. Eric On Fri, Aug 4, 2017 at 8:50 AM, Luc DAVID wrote: > Hello, > > I was thinking about a possible solution for sqlite "only single writer is > allowed at the same time" and database lock. > > sqlite has WAL mode for better concurrency and this could maybe be used to > extend the number of writters: > > Do you think it would be possible to create a > MyDb.WAL001...MyDb.WAL.002...MyDb.WAL.nnn when a write operation is > currently running in order to allow more writers? > > The sqlite engine would then take care of dealing with all the WAL files > when reading data, backup...etc > > The maximum of allowed writers could be set by a pragma or another mean > (when opening the db) > > It seems a simply way to boost sqlite concurrency. > > Am I wrong on this point ? > > Best Regards > > Luc > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Thinking about a way to extend the number of writers in WAL mode
Hello, I was thinking about a possible solution for sqlite "only single writer is allowed at the same time" and database lock. sqlite has WAL mode for better concurrency and this could maybe be used to extend the number of writters: Do you think it would be possible to create a MyDb.WAL001...MyDb.WAL.002...MyDb.WAL.nnn when a write operation is currently running in order to allow more writers? The sqlite engine would then take care of dealing with all the WAL files when reading data, backup...etc The maximum of allowed writers could be set by a pragma or another mean (when opening the db) It seems a simply way to boost sqlite concurrency. Am I wrong on this point ? Best Regards Luc ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users