[sqlite] New word to replace "serverless"
Rowan Worth wrote: > On Tue, 28 Jan 2020 at 06:19, Richard Hipp wrote: > >> Note that "in-process" and "embedded" are not adequate substitutes for >> "serverless". An RDBMS might be in-process or embedded but still be >> running a server in a separate thread. In fact, that is how most >> embedded RDBMSes other than SQLite work, if I am not much mistaken. Wait, really? AFAICS embedded means in-process, no IPC required to operate. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite
Keith Medcalf wrote: > > Indeed turning off memstatus leads to a 500% (from ~3s to ~0.5s) performance > increase. > Changing the threading mode or the indirection level of the mutexes calls > seems to have no significant effect. > Goes to show - publishing benchmark results without investigating why they are what they are is mostly pointless. When you suspect mutex contention is a significant factor, you should use something like mutrace to confirm your suspicion first. Fundamentally there ought to be no performance difference between running a 64-threaded server on a 64-threaded CPU vs 64 single-threaded processes. In practice, the single process with 64 threads ought to be slightly faster, due to less context switch overhead between threads, but if nothing else in the system is contending for CPU then context switching shouldn't even be an issue. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Multi threaded readers on memory sqlite cannot scale
Jens Alfke wrote: On May 14, 2018, at 11:25 PM, Howard Chu wrote: Do you have a link to the currently updated version of this? Google gives me projects that haven't been updated since 2015. That's the most recent version. So far I haven't seen any compelling new features in subsequent SQLite versions to warrant a resync of the code. It looks to be based on SQLite 3.7.16. New features since then include partial indexes, common table expressions, next-gen query planner, deferring foreign keys, WITHOUT ROWID tables, row values, secure pointer passing for native functions, sqlite3_serialize(), and the upcoming UPSERT … and that's just from skimming through the release history. Plus of course all sorts of query planner improvements, and misc. bug fixes. Many of those new features resulted in performance regressions, which is the main reason I stayed away from them. Partial indexes are a must-have for my use case, so when I ran across SQLightning in 2016 I quickly gave up on it. There have also been three or four bug fixes in recent SQLite releases that fixed serious problems we were having. If you build a modified version of SQLite in such a way that it can't feasibly* be updated by anyone but you, and then have no plans to keep it up to date, it isn't a product anyone else can seriously use. It's more of a personal project, or a gauntlet thrown down to the SQLite team. Which is fine as it goes, but I don't think it's a good idea to suggest other people use it. —Jens * I've looked at the source. There's no version history in Git, no copy of the original SQLite source files, and no markers I could find in the source that show where your changes are. I quickly decided that trying to merge in a newer version of SQLite would be a waste of time. Nonsense. https://github.com/LMDB/sqlightning/tree/mdb You didn't look carefully enough, so you're in no position to offer advice. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Multi threaded readers on memory sqlite cannot scale
Peter Da Silva wrote: Now I'm even more confused, there's "sqlightning" and "sqlitening". Are these different names for the same project? It doesn't seem so. Which was intended? https://github.com/LMDB/sqlightning I was referring to this. ^^ Since it uses LMDB, and LMDB readers are wait-free, you can run as many reads as you have CPU threads, without blocking. On 5/14/18, 8:47 AM, "sqlite-users on behalf of Bernard Ertl" wrote: This is the latest: http://www.sqlitening.planetsquires.com/index.php?topic=9427.0 I contributed to the last SQLitening update. No one has reported any issues that need fixing or updating since that update. It seems to be working quite well/stable. > On 5/13/18, 6:48 AM, "sqlite-users on behalf of Howard Chu" wrote: > SQLightning > Do you have a link to the currently updated version of this? Google gives me projects that haven't been updated since 2015. > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Multi threaded readers on memory sqlite cannot scale
Peter Da Silva wrote: On 5/13/18, 6:48 AM, "sqlite-users on behalf of Howard Chu" wrote: SQLightning Do you have a link to the currently updated version of this? Google gives me projects that haven't been updated since 2015. That's the most recent version. So far I haven't seen any compelling new features in subsequent SQLite versions to warrant a resync of the code. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Multi threaded readers on memory sqlite cannot scale
Clemens Ladisch wrote: Techno Magos wrote: So, memory sqlite is not really usable with multiple threads (readers). While one might expect that multiple readers of *memory *content could scale even better than with file content. Concurrent accesses to the same in-memory data structures must be serialized. In shared-cache mode, the connections share the cache, while on-disk connections each have their own cache. Is there some special mode possible to achieve scaling up throughput with multiple threads for memory sqlite content? Put a DB file on a RAM disk. Or on a normal disk (with looser synchronous and journal_mode settings), and rely on the OS file cache. Or just use SQLightning, which has no scalability limits for readers. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Enabling MMAP in Android
J Decker wrote: On Tue, Dec 12, 2017 at 4:35 PM, Howard Chu wrote: Martin Raiber wrote: On 12.12.2017 19:47 Simon Slavin wrote: On 12 Dec 2017, at 6:27pm, Jens Alfke wrote: On Dec 12, 2017, at 5:46 AM, Simon Slavin wrote: Before you answer that question, you should know that both Windows and macOS have been proved to have serious bugs in their memory mapping code. This has been brought up several times recently, but I’ve never seen any details given about exactly what was wrong with macOS’s mmap implementation. Does anyone have a pointer to authoritative information about this? See this thread: <http://sqlite.1065341.n5.nabble.com/SQLITE-vs-OSX-mmap-inev itable-catalog-corruption-td85620.html> This is the thread which led to memory mapping being disabled for writing on macOS, as discussed here: <http://sqlite.1065341.n5.nabble.com/Re-Database-corruption- and-PRAGMA-fullfsync-on-macOS-td95366i20.html> There might also be a mmap bug in the Android 7.0 fuse layer: https://www.mail-archive.com/openldap-its@openldap.org/msg10970.html There is definitely a bug in Android 7 fuse/mmap. The bug is definitely not present when bypassing fuse, but only rooted devices can bypass... My experience was in majority only rooted devices could use fuse. since /dev/fuse was rw--- . (although not all.) The /sdcard partition is always mounted through fuse because it's a vfat/exfat filesystem that doesn't support owner/permission bits, and the fuse driver imposes the Android security model on top of it. Many android devices no longer include a physical SDcard slot, but still have an internal storage partition that's labeled /sdcard and it still behaves this way. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Enabling MMAP in Android
Martin Raiber wrote: On 12.12.2017 19:47 Simon Slavin wrote: On 12 Dec 2017, at 6:27pm, Jens Alfke wrote: On Dec 12, 2017, at 5:46 AM, Simon Slavin wrote: Before you answer that question, you should know that both Windows and macOS have been proved to have serious bugs in their memory mapping code. This has been brought up several times recently, but I’ve never seen any details given about exactly what was wrong with macOS’s mmap implementation. Does anyone have a pointer to authoritative information about this? See this thread: <http://sqlite.1065341.n5.nabble.com/SQLITE-vs-OSX-mmap-inevitable-catalog-corruption-td85620.html> This is the thread which led to memory mapping being disabled for writing on macOS, as discussed here: <http://sqlite.1065341.n5.nabble.com/Re-Database-corruption-and-PRAGMA-fullfsync-on-macOS-td95366i20.html> There might also be a mmap bug in the Android 7.0 fuse layer: https://www.mail-archive.com/openldap-its@openldap.org/msg10970.html There is definitely a bug in Android 7 fuse/mmap. The bug is definitely not present when bypassing fuse, but only rooted devices can bypass... -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Enabling MMAP in Android
Jens Alfke wrote: I’m skeptical about mmap being broken on Macs, since there are other production databases such as Realm* that use it heavily. (Though I am not sure whether Realm uses writeable mappings.) —Jens * and LMDB, but I am not sure if LMDB is in use on macOS. LMDB is in common use on MacOS, no issues have been reported. On iOS there's this nagging problem that the virtual address space is still limited to 4GB, even on 64bit systems. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Read-only access which does not block writers
Nico Williams wrote: On Fri, Nov 25, 2016 at 09:08:37AM +0100, Florian Weimer wrote: I think you misunderstood what I was asking (see the start of the thread). I need to support Byzantine readers which do not follow the locking protocol. Based on the documentation, LMDB uses locks to implement MVCC and prevent premature page reuse. There's a good use case for an LMDB-like COW DB such that readers need never lock. That would require vacuuming by writing a new file and renaming it into place, which is a trade-off. One of the problems with LMDB's locks is that readers need write permission to the lock file, IIRC :( but at least it's a separate file. We developed an alternate locking protocol for Postfix. It just uses standard fcntl() locks. http://www.postfix.org/lmdb_table.5.html Of course nothing comes for free - with this approach, writers don't block readers, but readers block writers. That's the best you're going to get without custom lock protocols like LMDB uses natively. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Read-only access which does not block writers
Florian Weimer wrote: On 11/24/2016 10:41 PM, Howard Chu wrote: As a compromise you could use SQLightning, which replaces SQLite's Btree layer with LMDB. Since LMDB *does* allow readers that don't block writers. How does it do that? Does LMDB perform lock-free optimistic reads and retroactively verifies that the entire read operation was consistent? The web page currently says that “readers need write access to locks and lock file”: <http://lmdb.tech/doc/> Readers are lock-free/wait-free. Since LMDB uses MVCC readers get their own fully isolated snapshot of the DB so no retroactive verification is needed. The restriction on opening the database twice within the same process is something which we would have to work around, too. I see you are working on eliminating the key size limit, which is nice. Yeah, should be available in a month or so in 1.0 release candidates. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Read-only access which does not block writers
Florian Weimer wrote: On 11/24/2016 02:54 PM, Richard Hipp wrote: On 11/24/16, Florian Weimer wrote: I'd like to replace the use of Berkeley DB in RPM with SQLite. The scenario is special in the follow way. There is no database server, all access goes directly to the database. Unprivileged users without write access to the RPM database are expected to run read-only queries against the database. Privileged users (basically, root) is expected to use locking to exclude concurrent writers. But read-only users should not be able to stop acquisition of a write lock. Is there a way to do this with SQLite? Seems like quite a lot of burden to go this route. SQLite has a much larger footprint than BDB, and much worse performance overall. As a compromise you could use SQLightning, which replaces SQLite's Btree layer with LMDB. Since LMDB *does* allow readers that don't block writers. But it would be more efficient to just use LMDB directly, and not incur the overhead of the SQL translation layer. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Parallel access to read only in memory database
Daniel Meyer wrote: We are interested in using sqlite as a read only, in memory, parallel access database. We have database files that are on the order of 100GB that we are loading into memory. We have found great performance when reading from a single thread. We need to scale up to have many parallel reader threads. Once the DB is created it never needs to be modified. How can we allow many reader threads on an in memory, write once read many times database and achieve multi-core performance? Is this possible with sqlite? Thanks for all the helpful responses. I have moved forward experimenting with using parallel readers on an in memory sqlite database. I have found that I get true concurrency (multi-core speed up) when I create a new connection to my database file on disk in every thread. I've verified this by running a single long query and then running the same query in several threads and ensuring the net time is the same as the single thread query time. In order to get parallel readers on an in memory database I first loaded the file into memory with: rc = sqlite3_open_v2("file::memory:?cache=shared", &db, SQLITE_OPEN_URI | SQLITE_OPEN_READWRITE, NULL); I hold onto the db reference in the main thread after loading the data and don't close that connection until all the worker threads are done attempting to run my long query. I then spawn N threads, each creating their own connection like so: rc = sqlite3_open_v2("file::memory:?cache=shared", &db, SQLITE_OPEN_URI | SQLITE_OPEN_READONLY | SQLITE_OPEN_NOMUTEX, NULL); I have experimented with the various flags; however, everything I do gives me serial performance. There is some kind of mutex locking the database so that running my query say two times takes twice as long as running it once whereas with the disk based approach using this connection string in each thread: rc = sqlite3_open("data.sl3", &db); makes the total time to run the query twice the same as running it just once. I am hypothesizing that since we are using 'cache=shared' and hence each thread is sharing the same cache each read requires locking the database. What I would like is to get the same kind of behavior as we get with " file::memory:?cache=shared" wherein every time I open a new connection that connection points to the same memory; however, does not actually involve sharing the cache so no global mutex locks the database on every read. I have put my test code in a gist. My C code is here: https://gist.github.com/danielrmeyer/fae54d5993f2800626c616e72782b5eb I generate the 1.5GB test database with this python 3.4 script: https://gist.github.com/danielrmeyer/bfa415256502471d1512f2155e76adc2 I compiled the C code on my system with the following command: gcc -std=gnu99 test.c -o test -lsqlite3 -lpthread (I did download the amalgamation and copied the sqlite.h and sqlite.o files into my cwd after building) I apologize if the C code is not as clean as it could be. I'm primarily a Python programmer but figured i'd be more likely to get help with a C test case so I did my best to hack this together. The Python GIL was confusing the situation in any case. A little background on what I am doing: I have several large datasets that I wish to serve up to customers to generate custom reports based on unique slices of the data. I am using a cluster of machines with .5TB of memory each so loading all the data into memory is reasonable in my case. I've found that against my production work load I get massive speedups in single threaded tests against the in memory database relative to the disk version. In fact I have found that the single threaded sqlite in memory tests are faster than all the other database solutions i've looked at so I am very excited about using sqlite, nevertheless I really need to scale to many cores. Also, my work load is highly random so cache is not much help. I really want the data in memory. Any help is greatly appreciated. I have started experimenting with memory mapped io; however, I have not had much luck so far. Use LMDB in SQLightning. Since LMDB reads acquire no locks, they scale perfectly linearly across arbitrarily many CPUs. No other DB engine will do what you're looking for. Everything else based on locking will bottleneck as soon as you extend beyond a single CPU socket. I would not mind creating a fork of sqlite on github and hacking the code if someone could give me pointers on what needs to be modified to get this working. Certainly if there is an extra flag or URI I need to use to get concurrent in memory read access that would be great, but I'm willing to try and modify the source code and sharing with the community if I can figure out how to get this going. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http
Re: [sqlite] Parallel access to read only in memory database
Jens Alfke wrote: On Oct 9, 2016, at 10:41 AM, Howard Chu wrote: As for code freshness, I've seen no compelling new features from 3.8.x onward that would improve performance so there's been no reason to update further. Perhaps, but there’s important new functionality in newer versions, such as partial indexes and indexes on expressions. If it’s a personal project for you, or a tech demo, then it’s understandable for you to drop it when it stops being interesting; but if this is something intended for other people to use, they’re going to want to see it supported going forward (the way SQLCipher is, for instance.) It would be nice to add a note to the readme saying something like “FYI, this is based on SQLite 3.7.17 and there are no plans to sync with newer versions.” As with any open source project, if users want to see something change, the onus is on them to initiate those changes. Nobody associated with an open source project is ever obligated to proactively implement anything. particularly things that no user has asked for yet. SQLightning has many satisfied users already and none of them have requested the features you mentioned. As for myself, it works for me in my personal builds of Mozilla Seamonkey. Lack of development activity on a project doesn't mean it's dead and disused - it just means the project has accomplished its goals and its users are content. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Parallel access to read only in memory database
Jens Alfke wrote: On Oct 9, 2016, at 8:15 AM, Howard Chu wrote: Use SQLightning, it's designed specifically for write once read many workloads. "SQLite3 ported to use LMDB instead of its original Btree code” — sounds great, and the performance figures quoted in the readme are impressive. But the source code appears to be a modified version of SQLite’s source, which would make it very difficult to keep in sync with SQLite, and (as Domingo pointed out) the last commit is three years old and seems to be based on SQLite 3.7.17. So far this looks like an exciting proof-of-concept, but not something I’d use in a real project. (By comparison, SQLCipher is also released as a modified copy of SQLite, but they sync with SQLite regularly; the latest version from this April is based on 3.11.0.) It would be best if this were implemented as a separate plugin, but as I’m not familiar with the innards of SQLite, I’ll assume that simply wasn't feasible. (I know SQLite supports VFS plugins, but I think those just operate at the paging layer, below the b-tree.) SQLite3's code structure is monolithic, not modular, so there is no clean way to replace its underlying Btree layer. It's necessarily a hack-and-slash proposition. As for code freshness, I've seen no compelling new features from 3.8.x onward that would improve performance so there's been no reason to update further. Many of the SQLite performance enhancements from 3.8 are in its own Btree code, which is entirely ripped out in SQLightning and thus irrelevant. On the other hand, there have been multiple regressions (performance and otherwise) in subsequent releases, most often in the query planner, which I've chosen to avoid. E.g. https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg85558.html https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg86191.html https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg86901.html https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg89666.html https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg91201.html https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg92189.html https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg99646.html -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Parallel access to read only in memory database
Daniel Meyer wrote: We are interested in using sqlite as a read only, in memory, parallel access database. We have database files that are on the order of 100GB that we are loading into memory. We have found great performance when reading from a single thread. We need to scale up to have many parallel reader threads. Once the DB is created it never needs to be modified. How can we allow many reader threads on an in memory, write once read many times database and achieve multi-core performance? Is this possible with sqlite? Use SQLightning, it's designed specifically for write once read many workloads. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why so much I/O ? Can sqlite be improved ?
Domingo Alvarez Duarte wrote: Hello Simon ! I already did it without using "wal" and the result was the same. And even for my surprise in one try I stopped at the middle performed an "analyze" and the performance deteriorated a lot to a point that I needed to delete the stats tables to get the better performance without "analyze". I also tried with the lsm module and got a bit better performance but with an irregular timing and a bigger disk usage (20%). Also tested with lmdb with an astonishing insertion rate but with a lot more disk usage and irregular timing. Using LMDB the VACUUM command is supposed to be a no-op; at least that's how I intended it. Since LMDB deletes records immediately instead of leaving tombstones, there is nothing to vacuum. Also tested with leveldb with a worse performance and almost twice disk space usage. The data distribution on some tables seem to fall into the worst corner cases for btrees. Cheers ! On 01/10/16 18:26, Simon Slavin wrote: On 1 Oct 2016, at 10:18pm, Domingo Alvarez Duarte wrote: About the vacuum I also understand the need to rewrite the whole database but I'm not sure if it's really necessary to do almost 5 times the database size in both reads and writes (also an equivalent amount of I/O happened during insertions). Can you try it without db.exec_dml("PRAGMA wal_checkpoint(FULL);"); and see if that improves time ? That's the only thing I can see. You're using a nested INSERT OR IGNORE command I'm not familiar with. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Sync journal's directory per transaction? or until database closed? in PERSIST mode
Graham Holden wrote: Original message From: Dan Kennedy Date: 26/05/2016 18:04 (GMT+00:00) To: sqlite-users@mailinglists.sqlite.org Subject: Re: [sqlite] Sync journal's directory per transaction? or until database closed? in PERSIST mode On UNIX, it's possible to delete a file from the file-system while another process has it open. In this case the other process can continue reading and writing its file-descriptor as normal, but the data is stored in memory only, not on disk (since the directory entry has been deleted). Once the process exits or closes the file-descriptor, the data is lost. It probably doesn't affect the argument of why SQLite does what it does, but I thought (but it's been a long time since I poked around UNIX file-systems) that data can still be written (or read) to disk because the i-node chain is still present, it's just that there's no name by which another proces can access it. Presumably, the final close will release the disk blocks of a marked-as-deleted file. Correct. "The data is stored in memory only" is completely impractical - you can still write much more data than will fit in RAM. One common mistake of new Unix admins is to delete large log files from /var/spool when they're low on disk space; if they do this without telling syslog to close the files first, the space remains in use and can't be freed until syslog is killed and restarted. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Make mmap_size dynamic?
Mikael wrote: > On Tuesday, 3 May 2016, Richard Hipp > wrote: > >> On 5/2/16, Mikael wrote: >>> >>> If I just force it on (by hacking the build script), as long as mmap_size >>> always is 2^63, will Sqlite access the file via memory accesses only, and >>> never using fread/fwrite which would lead to undefined behavior because >> of >>> the absence of a UBC? >>> >> >> SQLite only reads using mmap. Write always happen using the write() >> (or pwrite()) system call. So it will never work on OpenBSD. >> >> The database file is mmap-ed read-only. This is so that stray >> pointers in the application cannot trivially corrupt the database >> file. > > > Aha. > > What do you say about the idea of a memory access only mode, enabled > with SQLITE_MMAP_NO_FILEIO on any OS, for the daring users who want that? It's certainly an avenue to getting functionality in OpenBSD. In general, e.g. on Linux, there's not much to gain from it. There is a small performance gain when all of your data fits in RAM. There is a larger performance loss when your DB is larger than RAM - every time you touch a page to write to it, if it's not already memory-resident, the OS faults it in from disk. It's a wasted page fault if you were simply going to overwrite the entire page (which e.g. LMDB does). If you update pages in-place, instead of doing COW as LMDB does, it may or may not be a wash, I haven't measured that use case. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
[sqlite] Changing the default page_size in 3.12.0
Jim Callahan wrote: > Is 4096 bytes a large enough page size? > > Apparently the disk drive industry has shifted from 512 byte sectors to > 4096 byte sectors. > http://tilt.lib.tsinghua.edu.cn/docs/tech/tp613_transition_to_4k_sectors.pdf > > Should SQLite maintain a 1:1 ratio between page size and sector size? > or should the page size be a multiple of the sector size? Say 2:1 (8096 or > 8K)? or 4:1 (16K). > > What sizes do other databases use? (SQL Server and Postgres both use 8096 > default) You mean 8192. > For years, virtual machines (VM) have used 4K pages (I think this started > with IBM VM/370); > while disk drives had 512 byte sectors (an 8:1 ratio). > > With a 2:1 ratio, in terms of seek time, one gets the second page for free. > > Would 8096 bytes (8K) be too much for a multi-tasking embedded device (such > as a smart phone?). You shouldn't even be discussing a hardcoded number. The page size should be equal to the page size of the underlying memory management system. 4K on common x86 systems, 8K on SPARC, etc. Choosing a number smaller than this will cost you in RMW ops whenever the filesystem tries to do an update. Choosing a number larger than this is generally going to waste memory. > > Are there any benchmarks? > > Jim > > > > > > > > On Fri, Mar 4, 2016 at 10:48 AM, Richard Hipp wrote: > >> The tip of trunk (3.12.0 alpha) changes the default page size for new >> database file from 1024 to 4096 bytes. >> >> https://www.sqlite.org/draft/releaselog/3_12_0.html >> https://www.sqlite.org/draft/pgszchng2016.html >> >> This seems like a potentially disruptive change, so I want to give >> you, the user community, plenty of time to consider the consequences >> and potentially talk me out of it. >> >> The "Pre-release Snapshot" on the download page >> (https://www.sqlite.org/download.html) contains this change, if you >> want to actually evaluate the latest changes in your application. >> >> We hope to release 3.12.0 in early April, or maybe even sooner, so if >> you want to provide feedback, you should do so without unnecessary >> delay. >> -- >> D. Richard Hipp >> drh at sqlite.org >> ___ >> sqlite-users mailing list >> sqlite-users at mailinglists.sqlite.org >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >> > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
[sqlite] Fwd: Re: SQLite and Thunderbird
Richard Hipp wrote: > I suppose that Thunderbird was making use of the fts3_tokenizer() > interface, which has be removed from standard builds due to security > concerns, as of version 3.11.0. You can reenable that feature at > compile-time by building with -DSQLITE_ENABLE_FTS3_TOKENIZER. See the > last bullet (the only bullet under the "Backwards Compability" > heading) of the release notes at > https://www.sqlite.org/releaselog/3_11_0.html for links to further > information. > > At this time, you basically have two options: > > (1) Compile your system sqlite3.so library using > SQLITE_ENABLE_FTS3_TOKENIZER and hope that none of the applications > that link against this library use it in such a way that the > fts3_tokenizer() could present a security vulnerability. > > (2) Statically link against a version of SQLite that you compile > yourself. SQlite is a single file of C code ("sqlite3.c") so making > it a part of the project source tree is not a big deal. > > Option (2) seems like the best choice to me since that guarantees that > Thunderbird will continue to operate regardless of what historical > version of sqlite3.so happens to be installed (or not installed) on > the system and regardless of the compile-time options used to create > that sqlite3.so. (For example, what if somebody installs a new > sqlite3.so that omits full-text search?) Static linking removes a > dependency and makes Thunderbird more robust. Thunderbird has *always* used its own statically built sqlite, just like all other Mozilla software. In fact, it has more than one copy: https://hg.mozilla.org/mozilla-central/file/918df3a0bc1c/db/sqlite3/src https://hg.mozilla.org/mozilla-central/file/918df3a0bc1c/security/nss/lib/sqlite > > On 2/26/16, Steven Haigh wrote: >> Dear sqlite-users list, >> >> I'd like to try and get some pointers on the following issue as >> documented on the following included BZ issues. >> >> Please CC me as I'm not subscribed to this list. >> >> >> Forwarded Message >> Subject: Re: SQLite and Thunderbird >> Date: Fri, 26 Feb 2016 10:06:25 +0100 >> From: Jan Stan?k >> Organization: Red Hat >> To: Steven Haigh >> CC: nils at redhat.com, stransky at redhat.com >> >> Hi, >> I presume thet this is general thunderbird issue, not Fedora specific >> one. If so, I would suggest asking at >> sqlite-users at mailinglists.sqlite.org, they are usually quite helpful. >> >> Regards, >> Jan >> >> Dne 26.2.2016 v 07:10 Steven Haigh napsal(a): >>> Re: >>> https://bugzilla.redhat.com/show_bug.cgi?id=1310864 >>> https://bugzilla.redhat.com/show_bug.cgi?id=1311032 >>> >>> Hi all, >>> >>> Just trying to open a channel of communication regarding these bugs. >>> >>> While I believe thunderbird uses a format of call that is depreciated in >>> the newer SQLite packages, it is not ideal to statically compile >>> thunderbird against sqlite to make it work (which I believe is the >>> current fix). >>> >>> Any suggestions on a long-term fix? >>> >> >> >> -- >> Jan Stanek - Red Hat Associate Developer Engineer - Databases Team >> >> >> >> -- >> Steven Haigh >> >> Email: netwiz at crc.id.au >> Web: https://www.crc.id.au >> Phone: (03) 9001 6090 - 0412 935 897 >> >> >> >> > > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
[sqlite] IS a SQLite db of small size as good as reliable cache?
GB wrote: > So where have you got this wisdom from? It's just plain Bullshit! > > Just as most cache managers do, Windows cache manager uses some sort of LRU > caching scheme. So all data once read from file is kept in memory until either > some memory pressure occurs or it is simply pushed out by newer data. No. Windows will toss clean pages out even in the total absence of memory pressure. It is moronic, but true. You can repeat the steps I outlined for yourself and see. https://groups.google.com/d/msg/comp.os.ms-windows.nt.misc/449tdNYPoX0/it4cx8Gvv2AJ > Actually, from what I've experienced, I'd recommend to size sqlite's memory to > fit for your largest transactions and leave most of the caching to the OS. > > And yes, unlike in many *NIX implementations, Windows' FlushFileBuffers() call > DOES issue according write commands to the hardware, so setting PRAGMA > synchronous ON/OFF makes a big difference in write performance on Windows > systems. > > Howard Chu schrieb am 30.01.2016 um 23:23: >> dpb wrote: >>> Dear Community, >>> >>> Assuming that SQLite would arrange the database table contents on page >>> boundary and >>> >>> 1) if db is small let us say maximum 5MB, >>> >>> 2 further if such a db is part of windows service / application running >>> continuously then may be pages will remain in the memory under normal >>> conditions. >>> >>> Q1) Operating System like Windows would cache the series of pages, is this >>> behavior not almost same as that of Cache? >> The Windows cache manager is quite aggressive at evicting cached pages from >> RAM. It used to be tunable back in Win2000, but none of those tuning knobs >> survived past WinXP. Generally, if you access some data, leave it for more >> than 5-10 seconds, don't expect to be able to reference it again without >> incurring a hard page fault. >> >> Note that the cache eviction runs quite frequently - once every 5 seconds or >> so, and evicts pages regardless of whether there's any memory pressure in the >> system. It's quite possibly the stupidest cache manager ever written. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
[sqlite] Bug: Successfully committed transaction rolled back after power failure
Stephen Chrzanowski wrote: > @Rowan; > > First off, whether the OS or SQLite is ACID or not, if you pull the plug on > your hardware, all bets are off on whether it'll BOOT, let alone recover a > single transaction. I get that this could be a useful tool when doing > disaster proofing, but, at that stage in the game of bulletproofing, you > can't win every battle, and you're running into that at 100 miles an hour. Your expectations are pretty low. On a properly configured Unix host, there's no reason for a powerfail to prevent a successful reboot. E.g., if you mount boot and root filesystems as read-only filesystems, they can never get corrupted. If you're using modern filesystems for your writable partitions (e.g., FSs with journaling) then there's also no reason for them to fail to come back online. So it just comes down to your application code being reliable. I should note that SQLightning has none of the problems being described in this thread - in its default mode, it is full-ACID and a powerfail cannot lose data or corrupt the database. And it does all this while being at least 30% faster on writes than vanilla SQLite. Yes, the OS could have bugs. Yes, the hardware could physically fail. That's pretty rare though; HDDs R/W heads auto-retract on powerfail so unless the entire mechanism actually jammed, there's no way for a powerfail to cause a head crash or any other destructive event. Bottom line - if your OS reboots successfully, there's no excuse for your database to not also come up successfully, fully intact. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
[sqlite] IS a SQLite db of small size as good as reliable cache?
James K. Lowden wrote: > On Sat, 30 Jan 2016 22:23:24 + > Howard Chu wrote: > >> Note that the cache eviction runs quite frequently - once every 5 >> seconds or so, and evicts pages regardless of whether there's any >> memory pressure in the system. It's quite possibly the stupidest >> cache manager ever written. > > Any insight into what they were thinking? Back when I used Windows > daily, it used to annoy me that every morning the machine had to warm > up again, to revive the state I'd left it in the night before. In > NetBSD I learned that unused memory is unused, so why not use it? As I understand it, the main rationale is reliability - they don't trust their OS to stay up longer than 3 seconds after any particular write operation. There's a lot more information here https://groups.google.com/forum/#!topic/comp.os.ms-windows.nt.misc/449tdNYPoX0 That's from 2005 and unfortunately the Windows kernel variables for cache tuning no longer exist. A lot of the relevant info no longer exists on the original websites either, but I was able to pull one up from the web archive: https://web.archive.org/web/20010825042328/http://www.sysinternals.com/ntw2k/source/cacheman.shtml > I have a feeling that "fast app launching" is the reason, as though > Windows users were excitedly punching the Start button with a > stopwatch. But maybe there's more to it than that? It may well be a user-oriented philosophy. It is certainly not a developer-oriented approach. It was my frustration with slow build times on Windows that led me to investigate this in the first place. https://bugzilla.mozilla.org/show_bug.cgi?id=294122#c69 A lengthier explanation of how it works is online here https://msdn.microsoft.com/en-us/library/bb742613.aspx but there's pretty much no information there that's actionable - aside from LargeSystemCache there's no tuning knobs left. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
[sqlite] IS a SQLite db of small size as good as reliable cache?
dpb wrote: > Dear Community, > > Assuming that SQLite would arrange the database table contents on page > boundary and > > 1) if db is small let us say maximum 5MB, > > 2 further if such a db is part of windows service / application running > continuously then may be pages will remain in the memory under normal > conditions. > > Q1) Operating System like Windows would cache the series of pages, is this > behavior not almost same as that of Cache? The Windows cache manager is quite aggressive at evicting cached pages from RAM. It used to be tunable back in Win2000, but none of those tuning knobs survived past WinXP. Generally, if you access some data, leave it for more than 5-10 seconds, don't expect to be able to reference it again without incurring a hard page fault. Note that the cache eviction runs quite frequently - once every 5 seconds or so, and evicts pages regardless of whether there's any memory pressure in the system. It's quite possibly the stupidest cache manager ever written. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
[sqlite] Bug: Successfully committed transaction rolled back after power failure
Simon Slavin wrote: > > On 28 Jan 2016, at 1:38pm, Bernard McNeill wrote: > >> === >> Like the user reading ?saving OK? and throwing away the >> Post-It with the original information >> === >> >> This is exactly my concern. >> The user throwing away the Post-It is entirely reasonable if he sees a >> message like that. >> >> Do you happen to know if Linux/Debian (which I think uses a journalling >> filesystem) carries this risk? > > The problem is not at the software level. Not true. There *is* a problem at the software level - on Linux, current BSD (and apparently also on QNX) you must fsync the containing directory when you make changes to the contents of a directory (create/delete/rename files). This is above and beyond whatever lies the hardware layer may tell you. It's a documented requirement in Linux, at least. It is also independent of whether or not the filesystem uses journaling. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
[sqlite] Bug: Successfully committed transaction rolled back after power failure
Richard Hipp wrote: > On 1/25/16, Howard Chu wrote: >> >> This is actually quite an unusual requirement; on older Unix systems you >> couldn't even *open* a directory, let alone obtain write access to it or >> fsync it. > > Yeah. When the SQLITE_DISABLE_DIRSYNC compile-time option is present, > we disable the directory sync logic for this reason. Some unixes > (HP/UX) require -DSQLITE_DISABLE_DIRSYNC in order to work. But Linux, > MacOS, and *BSD all work without it, so I thought I'd just not bring > that up... I would have to say this is a BSD OS bug as it breaks the guarantees stated in the manpages. I.e., rename() and unlink() syscalls are documented to be atomic, and fsync() doesn't say anything about being needed to sync a directory. http://www.unix.com/man-page/FreeBSD/2/fsync http://www.unix.com/man-page/FreeBSD/2/unlink http://www.unix.com/man-page/FreeBSD/2/rename/ I no longer have BSD source code on hand but I'd bet that when those manpages were written, all directory modifications in the BSD ffs were always synchronous. Linux obviously changed this but at least their fsync() manpage documents the behavior. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
[sqlite] Bug: Successfully committed transaction rolled back after power failure
Matthias-Christian Ott wrote: > On 2016-01-25 16:47, Richard Hipp wrote: >> On 1/25/16, Matthias-Christian Ott wrote: >>> >>> Does this mean that if I use SQLite SQLITE_EXTRA_DURABLE=0, PRAGMA >>> journal_mode=DELETE and PRAGMA synchronous=FULL, SQLite could loose a >>> transaction that it said to be committed depending on the VFS? >> >> Sort of. This appears to be true if you are running on QNX and you >> lose power (or do a hard reset) shortly after the transaction commits. >> It might be the case on other OSes/filesystems but it has never before >> been observed. >> >> This is not new behavior. This is apparently what SQLite has been >> doing for 15 years, across quadrillions of transactions on many >> billions of devices, and it has never before caused any issues, until >> just recently when Mr. Meinlschmidt upgraded to a newer version of >> QNX. > > So it would make sense to add a note that you should check whether your > target VFS and target operating environment needs an fsync after a > journal commit if you want to use this journal mode. Would it be > possible to make SQLITE_EXTRA_DURABLE a pragma? Some GNU/Linux > distributions package SQLite and therefore not every application can > compile SQLite with different options. The question isn't just whether "an fsync is needed" with journal mode - the question is *which* fsync is needed? The issue here is that file creation/deletion/rename ops require an fsync *on the containing directory*. This is actually quite an unusual requirement; on older Unix systems you couldn't even *open* a directory, let alone obtain write access to it or fsync it. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
[sqlite] POSIX and MMU (was Re: out of the void: xDlSym)
Scott Robison wrote: > On Sat, Jan 16, 2016 at 1:58 PM, James K. Lowden > wrote: > >> On Fri, 15 Jan 2016 21:41:41 -0500 >> Richard Damon wrote: >> >>> there are machines where it doesn't work (you just need a larger >>> program space than data space). >> >> Huh. An example of which is the "medium model" of the Intel 8086: >> 20-bit code pointers and 16-bit data pointers. A machine for which C >> compilers existed, and on which no Posix system will ever run (because >> it lacks an MMU). Thanks for that. >> > > Sorry for the OT diversion, but I'm just curious as I don't have historical > POSIX standards for reference. Does POSIX really *require* an MMU? > Certainly Unix like systems were written for 8086 class computers, but > given that POSIX was first standardized in 1988 I'm just curious as to > whether or not an MMU is a requirement or just really nice to have. ST-Minix ran on MC68000 - no MMU. POSIX API only defines a programming model, it doesn't mandate how it gets implemented under the covers. An MMU *can* make some things easier, but we had fork/exec etc. even without it. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
[sqlite] SQLITE vs. OSX mmap == inevitable catalog corruption?
Dan Kennedy wrote: > On 11/18/2015 03:37 PM, T?r?k Edwin wrote: >> On 11/17/2015 12:11 AM, Deon Brewis wrote: >>> We found that using SQLITE memory mapped I/O (mmap_size > 0) in OSX isn't >>> usable in any way shape or form. It will inevitably lead to catalog >>> corruption if you hard-reboot OSX, even without the database or application >>> open. >>> >> I tried to turn on mmap I/O in SQLite 3.9.1 in our application on CentOS 7 >> and Debian Jessie, but after 24-48h of continous writes to the DB I always >> get a corruption: >> SQLite result 0xb: database corruption at line 76915 of [767c1727fe] >> SQLite result 0xb: statement aborts at 10: [DELETE FROM revision_blocks >> WHERE revision_id=:revision_id] >> Query "DELETE FROM revision_blocks WHERE revision_id=:revision_id" failed: >> (code 0xb: database disk image is malformed) database disk image is malformed >> >> After this 'pragma integrity check' says: >> row 90814 missing from index sqlite_autoindex_revision_ops_1 >> wrong # of entries in index sqlite_autoindex_revision_ops_1 >> row 1046646 missing from index idx_revmap >> row 1046646 missing from index sqlite_autoindex_revision_blocks_1 >> wrong # of entries in index idx_revmap >> wrong # of entries in index sqlite_autoindex_revision_blocks_1 >> >> There are not reboots involved, just multiple processes accessing a WAL DB. >> Without mmap I/O I've never seen corrupted DBs in our application. > > As of yesterday, SQLite uses a read-only mapping in mmap mode. The db file is > written using plain old write(), just as in non-mmap mode: That's the safest way to use mmap, but keep in mind that this requires a unified buffer cache and systems like OpenBSD still don't have that, so this approach will cause corruptions on systems like that. > >http://sqlite.org/src/info/67c5d3c646c8198c > > It would be interesting to know if this clears the problem in your > environment. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
[sqlite] MMAP performance with databases over 2GB
Roger Binns wrote: > It would also be really nice if there wasn't a 2GB mmap limit on 64 > bit machines. The database would fit in my RAM around 4 times, and in > the address space more times than there are grains of sand! Yea I > know this isn't very Lite ... SQLightning has no such limit... https://github.com/LMDB/sqlightning there's nothing "heavy" about making maximum use of mmap. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
[sqlite] Mozilla wiki 'avoid SQLite'
Simon Slavin wrote: > The thing I always found interesting about SQL was that it picks three > English words, INSERT, DELETE, UPDATE, and says that that's all you need to > do. And it's right ! Is there something special about the 'three-ness' of > database operations ? Or are you meant to think of it as two writing > operations (INSERT, DELETE) and a convenience operation which combines them > (UPDATE) ? If there was another word, what would it be ? REPLACE ? > DUPLICATE ? LDAP/X.500 has Add/Delete/Modify as well. (It also has Rename, which doesn't really make sense for a tabular data store, but is useful for a hierarchical data structure.) > Also, why is there only one English word needed for reading operations ? > What would a database language look like if it has more than one word ? > Would there be a difference between FIND and SCAN ? X.500 has three separate operations Read, List, and Search. (Read = retrieve contents of a single entry, List = list the names of the immediate children of an entry, Search = search for any entries matching a filter.) LDAP combined all of these functions into a single Search operation. It's often considered to be a mistake on the part of the LDAP designers. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
[sqlite] Enabling mmap on OpenBSD: Anything really in the way, and how do?
Mikael wrote: > You who know SQLite all the way through, would either 1) or 2) or something > else still make a problem on OpenBSD? > > > If not, how do I use SQLite with all-file memory mapping on OpenBSD: I just Use SQLightning. https://github.com/LMDB/sqlightning -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
[sqlite] How do non-SQLite DBMS communicate?
Drago, William @ CSG - NARDA-MITEQ wrote: > All, > > I'm not sure how to ask this question. > > When using SQLite the application program accesses the SQLite DBMS > via its .dll file. When using something like Oracle Express (a local DBMS) the application program is communicating with Oracle Express via some sort of network protocol even though there's no network involved. What is that called? > > My reason for asking is, I'm writing a short white paper describing > my use of SQLite (to encourage other engineers where I work to use it too) and I'm trying to explain the differences between SQLite and other local database systems. So, SQLite databases are accessed via .dll where as other local databases run a server that is accessed via ??? The generic term is interprocess communication, IPC. You should be able to search on those keywords to find out anything else you want to know. SQLite is in the class known as "embedded databases" because its code is embedded in the application code, instead of running in a separate process. There are many other DBs of this class but most of them are lower level (key value stores) instead of offering the SQL data model. Berkeley DB is another well-known example of this type, as is LMDB. Many embedded DBs not only run within a single process, but also can only support read-write access to a database's files from a single process (i.e., no support for multi-process concurrency). Berkeley DB and LMDB are somewhat rare since they support transactional multi-process concurrency. > > Can anyone help me fill in those question marks? > Thanks, > > -- > Bill Drago > Senior Engineer > L3 Narda-MITEQ<http://www.nardamicrowave.com/> > 435 Moreland Road > Hauppauge, NY 11788 > 631-272-5947 / William.Drago at L-3COM.com<mailto:William.Drago at L-3COM.com> -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
[sqlite] sqlite3 (or sqlite4) performance on NFS
Stephen Chrzanowski wrote: >>From what I understand; > - Read-Only data > - Data doesn't change frequently > - Central repository for data > - Network latency causing issues > > My two cents on this is to keep a database revision ID kicking around and > do a SQLite backup of the remote data to a local storage medium. At > application launch, check the local version of the database, then check the > NFS version, and if there is a mismatch or a local copy doesn't exist, have > the application ask (Or force if no local copy exists) to copy the data > from remote to local, then read data from the local source. This will be a > bit of a PITA if you're talking gigabytes of storage on a saturated 100mbit > network or if drive space is limited. (I love my quiet GigE network) Congratulations, you've just reinvented AFS, but without any of its convenience. Seriously - there are many other distributed filesystems out there, all designed because users keep running into the same deficiencies of NFS, over and over again. Please, can we stop reinventing this wheel now? -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Re: [sqlite] presentation about ordering and atomicity of filesystems
Scott Robison wrote: On Sat, Sep 13, 2014 at 2:24 PM, Howard Chu wrote: Scott Robison wrote: A couple of academic thoughts. 1. If one wanted to embed the journal within the database, would it be adequate to reserve a specific page as the "root" page of the journal, then allocate the remaining pages as normal (either to the journal or the main database)? This does leave the big hole problem so it may still not be ideal, but it would give you a known location to find the beginning of the journal without doubling the database size or requiring an extra file. Starting with a known location is definitely a step in the right direction. 2. Building on 1, could sparse files be used to accomplish this? Seek to "really big constant offset" and do all journaling operations at that point, allowing the operating system to manage actual disk allocation? If We're talking about implementing a filesystem. "the operating system" is your own code, in this case, you don't get to foist the work off onto anyone else. No, Simon's original question was to the effect of why doesn't SQLite just use the already open database file for journaling purposes as well. OK, maybe I missed that, but I thought that question itself arose from how to use SQLite to implement a filesystem, on a raw partition. And the answer to that question (operating inside a raw partition) could apply equally well to operating inside a single file. If you preassign a fixed maximum size to the file, you could e.g. reserve the tail of the file for the journal, growing backward toward the head of the file, while the main data grows the usual direction from the head of the file toward the tail. This would basically be your (2) above. On HDDs this approach would have horrible seek latencies but it could work OK on SSDs. The other point though - like the existing journaling filesystems, you should not limit yourself to using a single file/storage device. Allow the option of storing the journal somewhere else - the performance potential is worth it. My point 1 was in response to the need to know where the journal file is, so just pick a dedicated page in the file as the root page of the journal, allowing the two files to be co-mingled. It doesn't address every possible bad reason for co-mingling the data, but it would at least answer the question "how do you find the journal". My second point was about existing SQLite database files that live in a file system managed by some operating system. SQLite already foists that work off on to someone else, this would be no different. It still may be a bad idea, but that's not the reason why it wouldn't work. :) -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] presentation about ordering and atomicity of filesystems
Scott Robison wrote: On Fri, Sep 12, 2014 at 6:21 PM, Richard Hipp wrote: On Fri, Sep 12, 2014 at 8:07 PM, Simon Slavin wrote: one thing that annoys me about SQLite is that it needs to make a journal file which isn't part of the database file. Why ? Why can't it just write the journal to the database file it already has open ? This would reduce the problems where the OS prevents an application from creating a new file because of permissions or sandboxing. Where in the database does the journal information get stored? At the end? What happens then if the transaction is an INSERT and the size of the content has to grow? Does that leave a big hole in the middle of the file when the journal is removed? During recovery after a crash, where does the recovery process go to look for the journal information? If the journal is at some arbitrary point in the file, where does it look. Note that we cannot write the journal location in the file header because the header cannot be (safely) changed without first journaling it but we cannot journal the header without first writing the journal location into the header. Journaling filesystems already have this problem. By default they just use a section of the partition, reserved at FS creation time. Which leads to the problem already described in the video that started this thread - perform a large enough write operation and you can exceed the fixed size of the journal, which requires the journal data to be split and the operation journal update is no longer atomic. Of course, most journaling filesystems also allow you to optionally specify an external journal - i.e., instead of embedding the journal on the filesystem's partition, you can use some other block device instead. Naturally you can also choose a larger size when doing this. Putting the journal on a separate device can bring some major performance benefits, as well as accomodating larger transactions. In the tests I did two years ago, JFS with an external journal was blazingly fast. http://symas.com/mdb/microbench/july/#sec11 One idea that might work is to interleave the journal information with the content. So for each page in the database, there is a corresponding page of journal content. The downside there is that you double the size of the database file without increasing its storage capacity. This is why LMDB is much better suited to this task - it uses no journal at all, nor does it require compaction/defragmentation/VACUUMing. A couple of academic thoughts. 1. If one wanted to embed the journal within the database, would it be adequate to reserve a specific page as the "root" page of the journal, then allocate the remaining pages as normal (either to the journal or the main database)? This does leave the big hole problem so it may still not be ideal, but it would give you a known location to find the beginning of the journal without doubling the database size or requiring an extra file. Starting with a known location is definitely a step in the right direction. 2. Building on 1, could sparse files be used to accomplish this? Seek to "really big constant offset" and do all journaling operations at that point, allowing the operating system to manage actual disk allocation? If We're talking about implementing a filesystem. "the operating system" is your own code, in this case, you don't get to foist the work off onto anyone else. this were possible, deleting the journal would be a "fast" truncate operation. A custom VFS might be able to provide a proof of concept... hmm. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] presentation about ordering and atomicity of filesystems
Scott Robison wrote: On Sat, Sep 13, 2014 at 1:43 PM, Richard Hipp wrote: Decades ago, files came in all kinds of varieties and permutations. Details varied from one OS to the next. But it was common to have a distinction between text files and binary files (with different APIs for accessing each.) It was also common to have a difference between "ordinary files" (that had to be read sequentially from beginning to end) and "random-access files", which supported operations similar to lseek(). (Once again, completely incompatible APIs existed for accessing each file type.) With binary files, one often had to specify a "block size" which was the increment in which the file was read and written. The block size was typically a property of the file and could not be changed after the file had been created. There were often restrictions on the permitted values for block sizes. And you couldn't ask the operating system to tell you whether a file was text or binary or sequential or random-access or what its block-size was; you just had to know. And bewildering problems resulted if you got it wrong. And this was not true just of big expensive business class machines. At the very least (and I suspect more) Commodore 8-bit DOS (which was embedded within a smart drive with its very own dedicated CPU & RAM) supported (essentially) sequential byte stream files (no random seeking for these!) and random access record oriented files (where the record size was set at file creation time). Man were those a pain in the backside to use. Now imagine writing an ftp client (or server) for one of these. I wrote both for an IBM mainframe, way back when. You had to trust the user to select the appropriate record mode for the ftp transfer to succeed without just getting garbage on the other end. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] presentation about ordering and atomicity of filesystems
Scott Robison wrote: On Sat, Sep 13, 2014 at 9:39 AM, Richard Hipp wrote: I say that a filesystem is an eventually-consistent key/value database. The keys are the filenames and the values are all big BLOBs, specifically the file content. Filesystems also have a hierarchical keyspace, which is an extension from the usual key/value concept, but it is still key/value. Dan Bernstein, author of qmail & djbdns (among others), used the file system as a configuration database for those applications. Rather than having a text configuration file, he used the directory and file names as keys and their contents as values. I seem to recall him later regretting this choice (in part, anyway) but I always thought there was a certain elegance to that solution. It's not perfect, but what is? OpenLDAP's config database currently uses the filesystem this way as well. It's no paragon of efficiency, but it doesn't need to be particularly performant in the first place, and it requires zero setup. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] presentation about ordering and atomicity of filesystems
Simon Slavin wrote: That's not what I meant. That's the file as a database. What I want is the entire volume as a database. That's exactly what I pointed you to before. The thesis is pretty enlightening too. http://www.fsl.cs.sunysb.edu/docs/kbdbfs-msthesis/ -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] presentation about ordering and atomicity of filesystems
Nico Williams wrote: On ZFS datasets with sync disabled fsync() functions as osync(), as a write barrier without durability and without the associated penalty. The obvious problem is that really do need osync() and fsync(); just one or the other is not a reasonable compromise. Write barriers have been debated in Linux ad nauseum. I agree that osync() would be great to have, but it's still a die roll - the OS can flush blocks to the storage device in order, but without waiting for the storage device's buffer to empty, can't make any further ordering promises from there. You need device-level ordering support too. - which prompted my suggestion here http://www.spinics.net/lists/linux-fsdevel/msg70047.html -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] presentation about ordering and atomicity of filesystems
Richard Hipp wrote: On Thu, Sep 11, 2014 at 5:49 PM, Kees Nuyt wrote: Hi all, Today I bumped into a presentation about ordering and atomicity of filesystems that might interest you. https://www.youtube.com/watch?v=YvchhB1-Aws The Application/Storage Interface: After All These Years, We're Still Doing It Wrong Remzi Arpaci-Dusseau, University of Wisconsin—Madison Talk at usenix 2014 Published on Sep 4, 2014 by USENIX Association Videos Somewhat related to the article drh recently wrote about using sqlite as an application data store. Thanks for the link, Kees! I just finished watching the video. Remzi Arpaci-Dusseau talks about research (done by he and his graduate students) into how well application data survives system crashes. Remzi observes that filesystem developers have worked very hard for many years ensuring that filesystem metadata is preserved in a crash, but they seem less concerned about protecting application data. Remzi developed tools (BOB and ALICE) to study various workloads to see how vulnerable they were to system crashes. He looked at various "applications". His definition of "application" includes standalone programs like Git and Hg, and database servers like PostgreSQL, and libraries like SQLite and LevelDB. At one point he shows a chart that counts the number of unwarranted assumptions that the applications make about filesystem behavior. Such unwarranted assumptions can lead to corruption following a system crash (or power loss). SQLite and PostgreSQL came out on top, with just one vulnerability each. Hg and Git each had many vulnerabilities. In fairness, Remzi points out that these vulnerabilities assume a "worst case" filesystem and that many of them might not exist on a modern filesystem like EXT4. Actually LMDB comes out on top with zero vulnerabilities. I spoke to the UWisc folks to find out what was the one Atomicity vulnerability they reported in LMDB and we confirmed that it was not in fact a valid vulnerability. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] presentation about ordering and atomicity of filesystems
Simon Slavin wrote: On 12 Sep 2014, at 3:18am, Scott Robison wrote: That was an interesting 90 minutes. Indeed. Thanks to Kees for posting it. Though I was surprised he didn't mention the term 'ACID' explicitly. I'm still of the opinion that we need an actual transactional file system with equivalents to BEGIN, END and ROLLBACK. It will have to support many transactions at the same time, of course, since each process will be doing its own thing. There have been such projects. They don't seem to have continued though. http://mile-outta-boyce.blogspot.ie/2007/05/kernel-berkeley-db-file-system-kbdbfs.html https://github.com/dmeister/kbdb/tree/master/kbdbfs-1.0 I've got a project underway to retrace their steps, using LMDB instead of BerkeleyDB. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQlite on flash filesystem
Simon Slavin wrote: On 16 Aug 2014, at 9:32pm, Baruch Burstein wrote: On Fri, Aug 15, 2014 at 10:45 AM, Levente wrote: I am thinking about putting an SQLite database on a flash drive (SD card). I would like to know what kind of file system is the optimal. I want to query a lot, and insert or update a few times per minute. Not directly related to your question, but I am curious: Since sqlite's atomicity guarantee assumes that write to a disk sector are linear (e.g. either beginning to end or end to beginning, but never the middle before either end), Mmm ... no. SQLite's atomicity assumes in-order execution of writes. In other words that the changes to the storage device will be done in the order they are requested. Anything else breaks ACID. However, SQLite does a lot of things to try and avoid a corrupted database even if ACID is broken, in an effort to let you rescue the maximum amount of data even if something went wrong with hardware. Removing all the 'paranoid about hardware' and 'rescue' code from SQLite would make it significantly smaller. I was wondering if this assumption was ever verified for flash drives. I have no information to contrary, but while that assumption makes a lot of sense for a spinning disk, I see no reason to assume this for flash drives and other SSDs. I would think that they may parallelize writes to different parts of the "sector" for efficiency. Current Flash chips are optimized for sequential writes - while they allow writes to a sector in random order, doing so requires sending a new row address for each random access. Meanwhile, if you simply write them sequentially the write address auto-increments. Since operating systems always perform whole-sector writes (indeed, they perform whole-page writes) to current storage devices, and they use a DMA controller which is programmed with a start address and a block length, you can safely assume that they are writing each sector sequentially. There is no efficiency gain in writing multiple sections of a single sector in parallel. (In fact it is an efficiency loss.) Multi-channel SSDs parallelize across multiple Flash chips, sending whole sectors to each separate channel/chip. A good question. By which I mean it's one people should worry about, not one which we're not sure about. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite & multi process usage
Richard Hipp wrote: On Thu, Jul 17, 2014 at 5:10 AM, Micka wrote: Well, All of my process are using Mutex to protect sqlite from concurrent access. What kind of mutex are you using that works across processs? All the mutexes I know about only work for a single process. pthreads supports process-shared mutexes. http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_mutexattr_getpshared.html BerkeleyDB and LMDB use these to support multi-process access to a DB, which is why their forks of SQLite3 also support multi-process access. It seems that FreeBSD and other related BSDs still don't implement these though, so you have to use semaphores on them instead. (Either SysV style or POSIX named semaphores.) -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite scalability
Valentin Davydov wrote: On Thu, Nov 21, 2013 at 11:43:32AM +0200, Baruch Burstein wrote: Hi all, I know SQLite is supposed to support DB sizes in the TB (I think the default configuration can reach 1TB). I am curious if anyone actually uses SQlite at anywhere near this. Yes. Does anyone use it regularly for DBs 500GB+ in size, or with tables containing 10 billion rows+? I've got an installation with SQLite DB of several terabytes in size. It contains about 20 billions thoroughly indexed records, and grows every day (more precisely, every night) by a few tens of millions of new records. How much concurrency does your use require? I've spent some efforts to eliminate concurrency in application. That is, updates and selects occur at very different times of the day. How long do selects take (assuming indexes are set correctly?) It depends of the size of the select. Single row is selected instantaneously. Check of the uniqueness takes about 1-2 minutes per 1 million of records, most of time being spent parsing SQL commands. Whereas aggregate functions over substantional fraction of the entire database, of course, take too long to be executed in real time. Use SQLightning instead - concurrency issues are irrelevant then, since writers don't block readers. And it will search multiple gigabytes per second, as opposed to your millions-per-minute figure above. Are there problems of locking ("normal" SQLite doesn't usually suffer from locking since transactions are very quick, but if transactions can be in the order of 100's of ms, I think locking can easily happen if the DB is accessed concurrently a few times a second, though I am not sure if this may only apply to writes). Yes. Single transaction (insertion of that tens of millions of new recors) takes hours in the worst case. There's no good reason for insertion of 10 million records to take hours. I understand that the answer to most of these questions can be very hardware (and software) dependent, Indeed not so. The only hardware capable of storing such amount of data is an array of magnetic disks, and their latency time (about 10-20 ms for random access) is much more than any reasonable software overhead. Even cache (internal SQLite page cache and/or operation system file cache) occupies the same memory and therefore has almost the same effect. The only software which determines the performance is SQLite itself, in my case, perhaps, trees rebalancing algorithm. 1TB SSDs are only ~$500; there's no reason to limit yourself to the slowness of magnetic disks these days. http://www.amazon.com/Samsung-Electronics-EVO-Series-2-5-Inch-MZ-7TE1T0BW/dp/B00E3W16OU SQLightning uses only the OS filesystem cache, so you get maximal use of the available system RAM instead of wasting half of it with redundant copies in application-level caches. but I am just trying to get a feel for SQLite's applicability for a project I am working on that may reach limits like these. The only definive SQLite limits are documentet in the relevant manual page. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Row Level Locking as in InnoDB
Raheel Gupta wrote: Look at the performance difference between BDB and SQLite3 here http://symas.com/mdb/microbench/#sec1 I did, and I really cant comment on that. The results are of 2012 and its almost 2013. You should update the page with a newer result set. Or you could just download the code and run it yourself. I can't think of any other single feature that would remove the "lite" I am not a database expert. If you say so, it must be the case. But if there is a way to implement concurrent writers in SQLite maintaining the "lite" in SQLite, I would be the most happiest person here :) You seem to enjoy asking a lot of others, without regard to cost. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Row Level Locking as in InnoDB
Raheel Gupta wrote: Hi, You use BDB SQL or BDB KV ? I built BDB 6.0.20 with --enable-sql_compat It made a libsqlite3.so in the .libs folder which I linked with my QT C++ Application. You must try it with SQLightning too, https://gitorious.org/mdb/ sqlightning I tried to build it, but it says lmdb.h missing. Will check it as well. You must copy (or symlink) the lmdb source code into the SQLightning source directory. On Fri, Nov 8, 2013 at 4:12 PM, Aris Setyawan wrote: This is the BDB SQL doc I found. http://docs.oracle.com/cd/E17076_02/html/bdb-sql/dbfeatures.html#bulkloading If you insert record in bulk, you can use PRAGMA TXN_BULK for optimization. I tested TXN_BULK, still pretty slow. Nearly 2.4 times. Also the space wastage is pretty high. I set a page size of 64K in this and the space wastage is very high as the DBNAME.db-journal folder has too many files. DBNAME.db is 448MB DBNAME.db-journal is 161MB. Which is after inserting 10 rows of 4K = 400MB. In SQLITE 3.7.17 I get only 6.69% wastage of space. I wish BDB worked faster to see my alternatives. BDB is inherently a very slow codebase. Look at the performance difference between BDB and SQLite3 here http://symas.com/mdb/microbench/#sec1 - on random writes BDB KV is only 8% faster than SQLite. If you add the SQL parsing overhead on top of that, that 8% performance margin is erased. If you use batched random writes, SQLite is already 2% faster than BDB KV, so BDB SQL can only be slower than SQLite. Whatever other differences there may be, there is no performance benefit to using BDB as a backend for SQLite. In most cases there is a performance loss. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Row Level Locking as in InnoDB
Aris Setyawan wrote: Hi Howard, I just looked, sophia is nothing special. See these microbench results. http://pastebin.com/cFK1JsCN LMDB's codebase is still smaller and faster. Nothing else touches LMDB's read speed. This is micro benchmark from sophia author compare with lmdb. http://sphia.org/benchmarks.html Quite off-topic for this list, but those results are garbage. https://github.com/pmwkaa/sophia_benchmark/issues/2#issuecomment-27740082 -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Row Level Locking as in InnoDB
Aris Setyawan wrote: SQLightning replaces the SQLite backend with Symas' LMDB, which also uses MVCC and thus supports high concurrency. It is also many times faster than BerkeleyDB and vanilla SQLite. Your MVCC is different compared to InnoDB or BDB locking. Every one should carefully read each DB's doc, then test it before decide to use it. Yes, it's different. In LMDB writers never block readers and readers never block writers. The original poster was complaining about SELECT taking a long time and preventing other threads from making progress. That problem doesn't exist in LMDB. BDB locking *might* be able to avoid this in many cases, if there are no hotspots, but is prone to deadlocks which require the calling application to retry failed requests. LMDB is storage engine optimized for read. Why you don't optimize it's write using cache oblivious data structure, for example LSM tree or create new, like in sophia (sphia.org) key value DB? I just looked, sophia is nothing special. See these microbench results. http://pastebin.com/cFK1JsCN LMDB's codebase is still smaller and faster. Nothing else touches LMDB's read speed. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Row Level Locking as in InnoDB
Raheel Gupta wrote: @Howard I had tested your code earlier but it didnt seem to be stable and getting it to run was a task. Also I learnt that it is a "in-memory" database. False. LMDB is a memory-mapped disk database, that is not the same as an in-memory database. @Aris are you saying BDB is better and faster than SQLite ? Oracle claims that. From what I can see, Oracle is wrong. On Sun, Nov 3, 2013 at 8:28 PM, Howard Chu wrote: Aris Setyawan wrote: SQLightning replaces the SQLite backend with Symas' LMDB, which also uses MVCC and thus supports high concurrency. It is also many times faster than BerkeleyDB and vanilla SQLite. Your MVCC is different compared to InnoDB or BDB locking. Every one should carefully read each DB's doc, then test it before decide to use it. LMDB is storage engine optimized for read. Why you don't optimize it's write using cache oblivious data structure, for example LSM tree or create new, like in sophia (sphia.org) key value DB? Because read optimization is what was important to us when I created LMDB. That's like asking why a hammer isn't a screwdriver. On 11/3/13, Howard Chu wrote: Aris Setyawan wrote: SQLite do not use row level locking, but db level locking, so it was the right behavior the second thread was blocked. For innodb like in SQLite, Oracle have SQLite compatible API, but use BDB backend. Since BDB use MVCC (row/page level locking), your threads only blocked if they will write in the same row/page. www.oracle.com/technetwork/database/berkeleydb/bdb- sqlite-comparison-wp-176431.pdf * You must aware that BDB now have AGPL license. SQLightning replaces the SQLite backend with Symas' LMDB, which also uses MVCC and thus supports high concurrency. It is also many times faster than BerkeleyDB and vanilla SQLite. https://gitorious.org/mdb/sqlightning/ On 11/3/13, Raheel Gupta wrote: Hi, I have been using SQLite for one project of mine and I will be storing TBs of Data. Now there will be a lot of selections in this database and I have come across one problem with SQLite. In journal_mode=delete the selection is database locked. When one thread does a "TRANSACTION" on the database and soon after another thread does "SELECT" on the database (using the same connection) or vice versa, the second thread has to wait till the first thread finishes. In order to avoid this, I had to use journal_mode=wal so that two threads dont have to wait when they both are doing SELECTs which might be taking 3-5 seconds to process. I was wondering if Row Level Locking would be introduced in journal_mode=delete as its there in InnoDB for MySQL. Atleast for selects or inserts Row Level rocking should be possible as neither modify the existing rows. journal_mode=wal is a little slower and has its own limitations over NFS. OR if there is a mode equivalent to innodb in SQLITE please do let me know. I need to do a lot of selects and inserts in my application and hence a row level locking is suitable vs table or database level locking. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Row Level Locking as in InnoDB
Aris Setyawan wrote: SQLightning replaces the SQLite backend with Symas' LMDB, which also uses MVCC and thus supports high concurrency. It is also many times faster than BerkeleyDB and vanilla SQLite. Your MVCC is different compared to InnoDB or BDB locking. Every one should carefully read each DB's doc, then test it before decide to use it. LMDB is storage engine optimized for read. Why you don't optimize it's write using cache oblivious data structure, for example LSM tree or create new, like in sophia (sphia.org) key value DB? Because read optimization is what was important to us when I created LMDB. That's like asking why a hammer isn't a screwdriver. On 11/3/13, Howard Chu wrote: Aris Setyawan wrote: SQLite do not use row level locking, but db level locking, so it was the right behavior the second thread was blocked. For innodb like in SQLite, Oracle have SQLite compatible API, but use BDB backend. Since BDB use MVCC (row/page level locking), your threads only blocked if they will write in the same row/page. www.oracle.com/technetwork/database/berkeleydb/bdb-sqlite-comparison-wp-176431.pdf * You must aware that BDB now have AGPL license. SQLightning replaces the SQLite backend with Symas' LMDB, which also uses MVCC and thus supports high concurrency. It is also many times faster than BerkeleyDB and vanilla SQLite. https://gitorious.org/mdb/sqlightning/ On 11/3/13, Raheel Gupta wrote: Hi, I have been using SQLite for one project of mine and I will be storing TBs of Data. Now there will be a lot of selections in this database and I have come across one problem with SQLite. In journal_mode=delete the selection is database locked. When one thread does a "TRANSACTION" on the database and soon after another thread does "SELECT" on the database (using the same connection) or vice versa, the second thread has to wait till the first thread finishes. In order to avoid this, I had to use journal_mode=wal so that two threads dont have to wait when they both are doing SELECTs which might be taking 3-5 seconds to process. I was wondering if Row Level Locking would be introduced in journal_mode=delete as its there in InnoDB for MySQL. Atleast for selects or inserts Row Level rocking should be possible as neither modify the existing rows. journal_mode=wal is a little slower and has its own limitations over NFS. OR if there is a mode equivalent to innodb in SQLITE please do let me know. I need to do a lot of selects and inserts in my application and hence a row level locking is suitable vs table or database level locking. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Row Level Locking as in InnoDB
Aris Setyawan wrote: SQLite do not use row level locking, but db level locking, so it was the right behavior the second thread was blocked. For innodb like in SQLite, Oracle have SQLite compatible API, but use BDB backend. Since BDB use MVCC (row/page level locking), your threads only blocked if they will write in the same row/page. www.oracle.com/technetwork/database/berkeleydb/bdb-sqlite-comparison-wp-176431.pdf * You must aware that BDB now have AGPL license. SQLightning replaces the SQLite backend with Symas' LMDB, which also uses MVCC and thus supports high concurrency. It is also many times faster than BerkeleyDB and vanilla SQLite. https://gitorious.org/mdb/sqlightning/ On 11/3/13, Raheel Gupta wrote: Hi, I have been using SQLite for one project of mine and I will be storing TBs of Data. Now there will be a lot of selections in this database and I have come across one problem with SQLite. In journal_mode=delete the selection is database locked. When one thread does a "TRANSACTION" on the database and soon after another thread does "SELECT" on the database (using the same connection) or vice versa, the second thread has to wait till the first thread finishes. In order to avoid this, I had to use journal_mode=wal so that two threads dont have to wait when they both are doing SELECTs which might be taking 3-5 seconds to process. I was wondering if Row Level Locking would be introduced in journal_mode=delete as its there in InnoDB for MySQL. Atleast for selects or inserts Row Level rocking should be possible as neither modify the existing rows. journal_mode=wal is a little slower and has its own limitations over NFS. OR if there is a mode equivalent to innodb in SQLITE please do let me know. I need to do a lot of selects and inserts in my application and hence a row level locking is suitable vs table or database level locking. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] logical to use sqlite to replace Berkeley DB ?
Patrick wrote: Hi Everyone After Oracle acquired Berkeley DB they changed the license. If people don't pay a significant licensing fee, it can now only be used for GPL code. I don't know Berkeley DB very well but I do know a moderate amount of Sqlite. I want to tinker with a compiler that uses DB, I was thinking about ripping it out and replacing it with Sqlite. Does this make sense? I know they are both zero configuration embedded DBs but DB is a key-value based one and I am assuming lighter, is this true? Any idea of how close they would be in terms of memory use and execution speed? BDB is much faster than SQLite, yes. In fact Oracle supplies a port of SQLite that uses BDB as the underlying Btree engine instead of SQLite's native code, and there's a significant performance gain. If you have an app that is comfortably using the key-value API of BDB it would introduce major inefficiencies to convert it to using SQL. So no, this doesn't seem like a logical action to take. If you're using BDB and want to switch off it because of the license issue, try OpenLDAP LMDB instead. No license hassles, and also several times smaller and faster than BDB. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Minimizing Internal Function Calls in SQLite
Richard Hipp wrote: On Tue, Jul 30, 2013 at 8:00 AM, Andrew Beal wrote: Hi All, Is there a way to minimize function calls internal to SQLite? At some points of execution, I have measured as much as 35 deep nested function calls. Since we are running on an embedded system, we don't support such a deep call table. Wow. What embedded system is it that doesn't support a call stack that is *only* 35 levels deep? The other obvious question - what kind of embedded system with such limited stack resources actually needs SQL for its data storage purposes? Is your schema really so complex? Are your queries? Do you need to do JOINs across multiple tables? -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Large Database Windows vs Linux
David de Regt wrote: It's the kind of useful help like this that makes me love the FOSS movement. All based in facts, of course. http://blog.zorinaq.com/?e=74 -David -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Walter Hurry Sent: Friday, June 28, 2013 5:09 PM To: sqlite-users@sqlite.org Subject: Re: [sqlite] Large Database Windows vs Linux On Fri, 28 Jun 2013 15:22:57 -0600, Keith Medcalf wrote: That would explain why the best thing to be done with System Destroyer (System Restore) is the same as the best way to handle the Hardware Destroyer (Power Management) in Windows. Disable it completely. The best thing to do with Windows is format the drive and install Unix or FreeBSD or Linux. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite 3.7.17 preview - 2x faster?
Dan Kennedy wrote: On 04/04/2013 08:44 PM, Howard Chu wrote: Richard Hipp wrote: The memory-mapped I/O is only enabled for windows, linux, mac OS-X, and solaris. We have found that it does not work on OpenBSD, for reasons we have not yet been able to uncove; but as a precaution, memory mapped I/O is disabled by default on all of the *BSDs until we understand the problem. As I understand it, OpenBSD lacks a unified buffer cache. They reported problems with LMDB in its default mode, too. But it works in some non-default mode? When both reads and writes are done via memory mapping? Or some other trick? Right. It works if you use a writable mmap and do all reads and writes thru the map. But any process that comes along and accesses the file using read will see invalid/stale information, and start double-caching the file pages. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite 3.7.17 preview - 2x faster?
Richard Hipp wrote: The memory-mapped I/O is only enabled for windows, linux, mac OS-X, and solaris. We have found that it does not work on OpenBSD, for reasons we have not yet been able to uncove; but as a precaution, memory mapped I/O is disabled by default on all of the *BSDs until we understand the problem. As I understand it, OpenBSD lacks a unified buffer cache. They reported problems with LMDB in its default mode, too. But FreeBSD should be OK. I don't know about any of the other BSD variants. The biggest performance gains occur on windows, mac, and solaris. The new code is also faster on linux, but not by as big a factor. The speed improvement is also heavily dependent upon workload. Some operations can be almost twice as faster. For others, there is no measurable speed improvement. Your feedback on whether or not the new code is faster for you, and whether or not it even works for you, is very important to us. Thanks for giving the new code a try. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite 3.7.17 preview - 2x faster?
Ryan Johnson wrote: 3. It seems like this would increase the "attack surface" for stray pointers in the host program. Granted, writes to stray pointers are not sqlite's fault, but they're an unfortunately common problem... and mmap makes user bugs more likely to directly corrupt the database on disk. Perceived reliability might drop as a result (I'm not arguing that the risk is worth giving up 2x, just pointing it out as a potential unintended consequence). This is why OpenLDAP LMDB uses a read-only mmap by default. User bugs get an immediate SEGV, and usually the bug becomes obvious and easy to fix. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] experimental (better?) usage based sqlite cache
Simon Slavin wrote: On 25 Feb 2013, at 11:33am, Howard Chu wrote: Gabriel Corneanu wrote: Following a few other discussions, I had the feeling that sqlite should benefit from a cache which discards cached pages in a least frequently used order. Just offhand, classical LRU is quite poor in terms of lock overhead. Gabriel writes "least frequently used". Howard writes "least recently used". You're not writing about the same thing. Doh, you're right. Sorry for the noise, going back to get some caffeine now. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] experimental (better?) usage based sqlite cache
Gabriel Corneanu wrote: Following a few other discussions, I had the feeling that sqlite should benefit from a cache which discards cached pages in a least frequently used order. Just offhand, classical LRU is quite poor in terms of lock overhead. The CLOCK refinement scales much better, because no reorganizing of LRU lists is needed during page references. And of course, having gone thru all of these exercises of fancy application-level cache algorithms already, it's still obvious that the best approach is to leave it to the kernel. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Concurrent read performance
Wayne Bradney wrote: Howard,Thanks for the links - mdb/sqlightning looks interesting.Before I dive in, however:1. As I understand it, sqlightning is a drop-in replacement for SQLite.Interop.dll, and I can still use System.Data.SQLite as my high-level interface? Right. 2. Is there a compiled release available? No. 3. How often is sqlightning updated against SQLite? We're at SQLite 3.7.15.2 / System.Data.SQLite 1.0.84.0 right now There is no set schedule, basically as interest and demand dictate. 4. Does sqlightning allow lock-free reads, even against a SQLite memory-backed, shared cache database? No. But you can get the same effect simply by putting the MDB database onto a RAMdisk. You are putting programming effort into making your code fast, and this is costing you (or your employer) programmer time. For any reasonably useful piece of software, every moment of programmer time invested in proper coding saves eons of user time. Putting programmer effort into making correct code fast is always The Right Thing to Do. Software that delivers the correct answer, late, is still wrong. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Concurrent read performance
Wayne Bradney wrote: All access in SQLite is serialized. Apologies if I'm missing something fundamental here, but that's not what I'm seeing with a file-backed database when shared cache is OFF.My test has a single table with 1M rows, and four queries that each yield 100K different rows. I run them two ways: 1. All queries in a loop on the same thread in the same connection.2. Each query in parallel on separate threads, each with its own connection. If all access were serialized, I would expect these two tests to take about the same amount of time overall, wouldn't I?In fact, with a file-backed database and shared cache OFF, the second run takes about 70% less time.With shared cache ON, they're the same. As to your second point, I probably should have made it clear that this isn't an internal project, it's a software product, and we don't control where it runs. I understand what an SSD is and why it's better than a spindle drive, but my question wasn't really meant to solicit suggestions for performa n c e improvements outside the proposal at hand, which was to retire our existing home-grown in-memory cache implementation (which is very fast for concurrent reads, but is extremely limited in how it can be queried), and replace it with a SQL-capable, relational store and still get roughly the same performance. Our expectation was that we could achieve this with SQLite, but were surprised by the apparent lack of read-concurrency, and wanted to get some input on what our options might be in terms of SQLite configuration of memory-backed databases. > From: slav...@bigfraud.org You should look into MDB, which does no locking for read operations. Reads scale perfectly across arbitrarily many CPUs. More info here http://symas.com/mdb/ and SQLite ported to use MDB as its backend is available here https://gitorious.org/mdb/sqlightning Date: Sat, 12 Jan 2013 17:48:56 + To: sqlite-users@sqlite.org Subject: Re: [sqlite] Concurrent read performance On 12 Jan 2013, at 5:38pm, Wayne Bradney wrote: "mode=memory&cache=shared" 1. when shared cache is enabled, all reads are serialized, and All access in SQLite is serialised. All transactions require locking the entire database. SQLite is very simple -- 'lite' -- so queries run extremely quickly, so you don't normally realise that any locking has taken place. 2. there doesn't seem to be any way to have a memory-backed database that can be accessed by multiple connections without using a shared cache, then I guess I MUST use a file-backed database to get concurrent reads, even though I don't need the persistence and don't want to take the I/O hit. Am I making any sense? Anything I'm missing? You are putting programming effort into making your code fast, and this is costing you (or your employer) programmer time. For any reasonably useful piece of software, every moment of programmer time invested in proper coding saves eons of user time. Putting programmer effort into making correct code fast is always The Right Thing to Do. Software that delivers the correct answer, late, is still wrong. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Please help. Read only process being blocked by writer process.
Alejandro Martínez wrote: Thanks Richard, that makes perfect sense. Thanks Howard, but i don't know what you are talking about, so i will google "copy-on-write". See the papers and presentations here: http://www.symas.com/mdb Source code for SQLite is here: http://gitorious.org/mdb Григорий Григоренко, Interesting! I'll consider this approach if at some point i'm able to go "scorched earth" and start this from scratch, but at this point i would have to change too much stuff. I will go with the WAL solution for now. I'm just worried a buggy process could hang while having a prepared statement open and cause the wal file to grow forever, then causing errors in other processes. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Please help. Read only process being blocked by writer process.
Richard Hipp wrote: On Wed, Nov 28, 2012 at 9:58 AM, Alejandro Martínez wrote: And wouldn't PRAGMA read_uncommitted achieve the effect i was expecting? Or does that cause other problems? read_uncommitted only works if both the read and writer are in the same process and are using shared cache. Reading "old" or inconsistent data would not be a problem for me. (as long as it is not corrupted data). That's really the crux of the problem. Suppose the writer has needing to split a btree node, for example. To do this, the writer would typically write two new child pages and then overwrite the old b-tree page with the parent page. If the writer were part way through this sequence when the reader comes along, the reader would see corrupt data, since the btree structure would be only partially updated and hence not well-formed. That's one of the fundamental problems with update-in-place DB designs. (Aside from their crash vulnerability and expensive crash recovery...) MVCC via copy-on-write has none of these issues. On Wed, Nov 28, 2012 at 11:20 AM, Richard Hipp wrote: On Wed, Nov 28, 2012 at 7:09 AM, Alejandro Martínez wrote: Ok, i will probably do that. Thank you. But i'd like to know. Why doesn't this work without wal? A read only operation shouldn't block, right? If you are not running WAL, then the database is updated directly. That means that there can be no readers active when a write is in progress because then the readers would see an incomplete and uncommitted transaction. And regarding the commit failing, does that need a busy timeout handler too? From documentation i though it would just wait until all readers are done reading and then write. And that further incoming readers would wait for those 5 seconds. And i was expecting not to really wait, as the commit should be pretty quick, right? I'm puzzled. On Wed, Nov 28, 2012 at 10:01 AM, Richard Hipp wrote: On Wed, Nov 28, 2012 at 6:56 AM, Alejandro Martínez < elpeq...@gmail.com wrote: Is that the only way? When i had done that in the past, the wal file grew constantly and i am afraid it could fill the hard disk. That could happen if say... one of the reading processes doesn't properly sqlite3_reset a prepared statement after stepping it. right? Correct. The WAL file will grow until a checkpoint resets it. And a checkpoint cannot reset the WAL file while there is a pending transaction. So if you have a statement holding a transaction open, the WAL file will grow without bound. The solution there is to not hold read transactions open indefinitely. Call sqlite3_reset() when you are done with a statement so that its implied read transaction will close. Thank you for your quick answer. On Wed, Nov 28, 2012 at 9:52 AM, Richard Hipp wrote: PRAGMA journal_mode=WAL -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] light weight write barriers
Ric Wheeler wrote: On 11/16/2012 10:06 AM, Howard Chu wrote: David Lang wrote: barriers keep getting mentioned because they are a easy concept to understand. "do this set of stuff before doing any of this other set of stuff, but I don't care when any of this gets done" and they fit well with the requirements of the users. Users readily accept that if the system crashes, they will loose the most recent stuff that they did, *some* users may accept that. *None* should. but they get annoyed when things get corrupted to the point that they loose the entire file. this includes things like modifying one option and a crash resulting in the config file being blank. Yes, you can do the 'write to temp file, sync file, sync directory, rename file" dance, but the fact that to do so the user must sit and wait for the syncs to take place can be a problem. It would be far better to be able to say "write to temp file, and after it's on disk, rename the file" and not have the user wait. The user doesn't really care if the changes hit disk immediately, or several seconds (or even 10s of seconds) later, as long as there is not any possibility of the rename hitting disk before the file contents. The fact that this could be implemented in multiple ways in the existing hardware does not mean that there need to be multiple ways exposed to userspace, it just means that the cost of doing the operation will vary depending on the hardware that you have. This also means that if new hardware introduces a new way of implementing this, that improvement can be passed on to the users without needing application changes. There are a couple industry failures here: 1) the drive manufacturers sell drives that lie, and consumers accept it because they don't know better. We programmers, who know better, have failed to raise a stink and demand that this be fixed. A) Drives should not lose data on power failure. If a drive accepts a write request and says "OK, done" then that data should get written to stable storage, period. Whether it requires capacitors or some other onboard power supply, or whatever, they should just do it. Keep in mind that today, most of the difference between enterprise drives and consumer desktop drives is just a firmware change, that hardware is already identical. Nobody should accept a product that doesn't offer this guarantee. It's inexcusable. B) it should go without saying - drives should reliably report back to the host, when something goes wrong. E.g., if a write request has been accepted, cached, and reported complete, but then during the actual write an ECC failure is detected in the cacheline, the drive needs to tell the host "oh by the way, block XXX didn't actually make it to disk like I told you it did 10ms ago." If the entire software industry were to simply state "your shit stinks and we're not going to take it any more" the hard drive industry would have no choice but to fix it. And in most cases it would be a zero-cost fix for them. Once you have drives that are actually trustworthy, actually reliable (which doesn't mean they never fail, it only means they tell the truth about successes or failures), most of these other issues disappear. Most of the need for barriers disappear. I think that you are arguing a fairly silly point. Seems to me that you're arguing that we should accept inferior technology. Who's really being silly? If you want that behaviour, you have had it for more than a decade - simply disable the write cache on your drive and you are done. You seem to believe it's nonsensical for someone to want both fast and reliable writes, or that it's unreasonable for a storage device to offer the same, cheaply. And yet it is clearly trivial to provide all of the above. If you - as a user - want to run faster and use applications that are coded to handle data integrity properly (fsync, fdatasync, etc), leave the write cache enabled and use file system barriers. Applications aren't supposed to need to worry about such details, that's why we have operating systems. Drives should tell the truth. In event of an error detected after the fact, the drive should report the error back to the host. There's nothing nonsensical there. When a drive's cache is enabled, the host should maintain a queue of written pages, of a length equal to the size of the drive's cache. If a drive says "hey, block XXX failed" the OS can reissue the write from its own queue. No muss, no fuss, no performance bottlenecks. This is what Real Computers did before the age of VAX Unix. Everyone has to trade off cost versus something else and this is a very, very long standing trade off that drive manufacturers have made. With the cost of storage falling as rapidly as it has in recent years, this is a stupid tradeoff.
Re: [sqlite] light weight write barriers
David Lang wrote: barriers keep getting mentioned because they are a easy concept to understand. "do this set of stuff before doing any of this other set of stuff, but I don't care when any of this gets done" and they fit well with the requirements of the users. Users readily accept that if the system crashes, they will loose the most recent stuff that they did, *some* users may accept that. *None* should. but they get annoyed when things get corrupted to the point that they loose the entire file. this includes things like modifying one option and a crash resulting in the config file being blank. Yes, you can do the 'write to temp file, sync file, sync directory, rename file" dance, but the fact that to do so the user must sit and wait for the syncs to take place can be a problem. It would be far better to be able to say "write to temp file, and after it's on disk, rename the file" and not have the user wait. The user doesn't really care if the changes hit disk immediately, or several seconds (or even 10s of seconds) later, as long as there is not any possibility of the rename hitting disk before the file contents. The fact that this could be implemented in multiple ways in the existing hardware does not mean that there need to be multiple ways exposed to userspace, it just means that the cost of doing the operation will vary depending on the hardware that you have. This also means that if new hardware introduces a new way of implementing this, that improvement can be passed on to the users without needing application changes. There are a couple industry failures here: 1) the drive manufacturers sell drives that lie, and consumers accept it because they don't know better. We programmers, who know better, have failed to raise a stink and demand that this be fixed. A) Drives should not lose data on power failure. If a drive accepts a write request and says "OK, done" then that data should get written to stable storage, period. Whether it requires capacitors or some other onboard power supply, or whatever, they should just do it. Keep in mind that today, most of the difference between enterprise drives and consumer desktop drives is just a firmware change, that hardware is already identical. Nobody should accept a product that doesn't offer this guarantee. It's inexcusable. B) it should go without saying - drives should reliably report back to the host, when something goes wrong. E.g., if a write request has been accepted, cached, and reported complete, but then during the actual write an ECC failure is detected in the cacheline, the drive needs to tell the host "oh by the way, block XXX didn't actually make it to disk like I told you it did 10ms ago." If the entire software industry were to simply state "your shit stinks and we're not going to take it any more" the hard drive industry would have no choice but to fix it. And in most cases it would be a zero-cost fix for them. Once you have drives that are actually trustworthy, actually reliable (which doesn't mean they never fail, it only means they tell the truth about successes or failures), most of these other issues disappear. Most of the need for barriers disappear. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Shared in-memory SQLite database in shared memory
Jaco Breitenbach wrote: Dear all, My application consists of several indepent processes that must all access (read) the same data table during processing. In order to optimise memory usage I was wondering if it is possible to load an in-memory SQLite database into shared memory. The database would be maintained by a separate management process and all other processes would only require read access. This way only one copy of the database would have to be held in system memory. Is this possible with the current implementation of SQLite? Any suggestions would be welcome. The OpenLDAP MDB (memory mapped database) library will do exactly what you want. Since it uses a shared memory map to access the DB, no matter how many processes access it concurrently there's only one copy of data present in RAM. It also performs reads faster than anything else, even pure in-memory databases. Nothing else is anywhere close to as efficient as MDB for reads. Read more here http://highlandsun.com/hyc/mdb/ The port of SQLite using MDB as its backend is available on gitorious https://gitorious.org/mdb/ -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] light weight write barriers
Alan Cox wrote: How about that recently preliminary infrastructure to send ORDERED commands instead of queue draining was deleted from the kernel, because "there's no difference where to drain the queue, on the kernel or the storage side"? Send patches. Isn't any type of kernel-side ordering an exercise in futility, since a) the kernel has no knowledge of the disk's actual geometry b) most drives will internally re-order requests anyway c) cheap drives won't support barriers Even assuming the drives honored all your requests without lying, how would you really want this behavior exposed? From the userland perspective, there are very few apps that care. Probably only transactional databases, really. As a DB author, I'm not sure I'd be keen on this as an open() or fcntl() option. Databases that really care would be on dedicated filesystems and/or devices, so per-file control would be tedious. You would most likely want to say "all writes to this string of devices should be order-preserving" and forget about it. With that guarantee, a careful writer can have perfectly intact data structures all the time, without ever slowing down for a fsync. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] light weight write barriers
On Thu, Oct 25, 2012, Nico Williams wrote: Incidentally, here's a single-file, bag of b-trees that uses a COW format: MDB, which can be found in git://git.openldap.org/openldap.git, in the mdb.master branch. Complete docs, design notes, and benchmark results are available here: http://highlandsun.com/hyc/mdb/ I already discussed some of this on the sqlite-dev mailing list back in July. I'll also be giving a talk on the design/internals of MDB in Barcelona at LinuxCon Europe on Wednesday November 7. Drop by if you're in the area... Our work on an sqlite 4 backend built on MDB is progressing as well. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users