Re: [sqlite] Theoretical write performance for low-throughput devices
Ok, skip that. locking_mode=EXCLUSIVE (1) takes care of that. -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users On >Behalf Of Keith Medcalf >Sent: Tuesday, 22 October, 2019 12:46 >To: SQLite mailing list >Subject: Re: [sqlite] Theoretical write performance for low-throughput >devices > >>and only one thread accesses that connection at a time. The target >>lacks mmap(), posix file locks, and multiple address spaces, so this >>seemed like the right settings to use. > >So what happens to the shm file? Is in not normally just an ordinary >file that is mmap'ped? > >What would be the effect of not having mmap available? > >-- >The fact that there's a Highway to Hell but only a Stairway to Heaven >says a lot about anticipated traffic volume. > > > >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Theoretical write performance for low-throughput devices
>and only one thread accesses that connection at a time. The target >lacks mmap(), posix file locks, and multiple address spaces, so this >seemed like the right settings to use. So what happens to the shm file? Is in not normally just an ordinary file that is mmap'ped? What would be the effect of not having mmap available? -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Theoretical write performance for low-throughput devices
On Mon, Oct 21, 2019 at 5:47 PM Richard Hipp wrote: > > On 10/21/19, Jonathan Brandmeyer wrote: > > > > No significant change. The target filesystem only caches non-aligned > > writes, so there usually isn't anything for it to do on fsync anyway. > > > > OK. I don't have any more ideas at the moment, and without access to > your code, and your platform, I can't really debug it. But do please > know that you should only be getting less than 2x writes. I suggest > adding instrumentation and trying to come up with a simplified test > case. Should the vLogVfs be picking up writes to both the WAL and the database proper? I'm seeing a number of WRITE's in the log equal to the final number of pages plus a small percentage. For the case where the WAL is much larger than the benchmark, the number of WRITE's in the log is almost equal to the number of final pages in the database. Thanks, -- Jonathan Brandmeyer PlanetiQ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Theoretical write performance for low-throughput devices
On Tue, Oct 22, 2019 at 2:03 AM Wout Mertens wrote: > This probably won't change a thing, but I wonder why you wouldn't set the > sqlite page size to 2KB? Hopefully this means lots of aligned writes. At one point, the row blobs were exactly 1024 bytes. This isn't great for 4kB pages, but it is a particularly unfavorable row size for 2 kB database pages. -- Jonathan Brandmeyer ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Theoretical write performance for low-throughput devices
On Mon, Oct 21, 2019 at 5:28 PM Jonathan Brandmeyer < jbrandme...@planetiq.com> wrote: > I'm working on an embedded system that uses a log-structured > filesystem on raw NAND flash. This is not your typical workstation's > managed flash (SATA/NVMe), or portable managed flash (SD/USB). It's a > bare-nekkid ONFI-speaking chip. All reads and writes are one 2kB page > at a time. There is no readahead, and no write buffering by the > driver or filesystem for page-sized writes. > This probably won't change a thing, but I wonder why you wouldn't set the sqlite page size to 2KB? Hopefully this means lots of aligned writes. Wout. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Theoretical write performance for low-throughput devices
On Mon, 21 Oct 2019 at 23:28, Jonathan Brandmeyer wrote: > Or, how many times is each page written by SQLite for an insert-heavy > test? The answer appears to be "4", but I can only account for two of > those four. > > I'm working on an embedded system that uses a log-structured > filesystem on raw NAND flash. This is not your typical workstation's > managed flash (SATA/NVMe), or portable managed flash (SD/USB). It's a > bare-nekkid ONFI-speaking chip. All reads and writes are one 2kB page > at a time. There is no readahead, and no write buffering by the > driver or filesystem for page-sized writes. > > We got the following performance numbers out of the flash storage: > > Streaming reads through the filesystem: 7.5 MB/s. > Streaming writes through the filesystem: 5.4 MB/s. > Single insert performance through SQLite: 0.2 MB/s. > Bulk insert performance through SQLIte: 1.3 MB/s, asymptotic for very > large transactions. > There's perhaps a measurement missing here: random-access/seeking writes through the filesystem. Which doesn't sound like it should be a factor based on the technology involved, but it's more reflective of sqlite's workload when updating the DB. The smallest possible insert here modifies at least one DB page for the table and one for the index, so that's 8kb written to the journal and 8kb to the main DB. But bulk inserts should greatly reduce the impact of index writes, as you said. Hm, how does deletion/truncation perform on the target filesystem? -Rowan ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Theoretical write performance for low-throughput devices
On 10/21/19, Jonathan Brandmeyer wrote: > > No significant change. The target filesystem only caches non-aligned > writes, so there usually isn't anything for it to do on fsync anyway. > OK. I don't have any more ideas at the moment, and without access to your code, and your platform, I can't really debug it. But do please know that you should only be getting less than 2x writes. I suggest adding instrumentation and trying to come up with a simplified test case. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Theoretical write performance for low-throughput devices
On Mon, Oct 21, 2019 at 5:00 PM Richard Hipp wrote: > > On 10/21/19, Jonathan Brandmeyer wrote: > > On Mon, Oct 21, 2019 at 10:31 AM Richard Hipp wrote: > >> > >> On 10/21/19, Jonathan Brandmeyer wrote: > >> > Or, how many times is each page written by SQLite for an insert-heavy > >> > test? The answer appears to be "4", but I can only account for two of > >> > those four. > >> > >> Are you measuring writes at the OS-interface layer, or writes at the > >> hardware layer? SQLite should issue no more than 2x writes in WAL > >> mode, and less if the same page is modified more than once. But who > >> knows how the OS might be multiplying this to accomplish its own > >> filesystem consistency. > >> > >> Consider compiling the > >> https://www.sqlite.org/src/file/ext/misc/vfslog.c extension into > >> SQLite on a workstation and running your code there, to get more > >> details about everything that SQLite is doing with storage. > > > > Thanks for the pointer. I didn't do this exactly, but instead > > instrumented SQLite's unixWrite() and my lowest-level NAND driver's > > block write function on the target hardware to capture summary > > counters of both events. The filesystem's own consistency overhead > > adds up to about 2% additional writes - not nearly enough to account > > for the missing 2x. The throughput implied by the low-level counters > > is consistent with the benchmark results. > > > > What happens if you set "PRAGMA synchronous=OFF". No significant change. The target filesystem only caches non-aligned writes, so there usually isn't anything for it to do on fsync anyway. > (I'm not suggesting > that you do this in production - it is just an experiment to try to > help figure out what is going on.) Understood. If it helps, we are doing a couple of other things that are dissimilar to a typical workstation configuration. We are using the builtin "unix-none" VFS, the -DSQLITE_DEFAULT_WAL_SYNCHRONOUS=1 and -DSQLITE_DEFAULT_LOCKING_MODE=1 compile-time options, along with a design rules that there are zero or one database connection per file, and only one thread accesses that connection at a time. The target lacks mmap(), posix file locks, and multiple address spaces, so this seemed like the right settings to use. Thanks, -- Jonathan Brandmeyer PlanetiQ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Theoretical write performance for low-throughput devices
On 10/21/19, Jonathan Brandmeyer wrote: > On Mon, Oct 21, 2019 at 10:31 AM Richard Hipp wrote: >> >> On 10/21/19, Jonathan Brandmeyer wrote: >> > Or, how many times is each page written by SQLite for an insert-heavy >> > test? The answer appears to be "4", but I can only account for two of >> > those four. >> >> Are you measuring writes at the OS-interface layer, or writes at the >> hardware layer? SQLite should issue no more than 2x writes in WAL >> mode, and less if the same page is modified more than once. But who >> knows how the OS might be multiplying this to accomplish its own >> filesystem consistency. >> >> Consider compiling the >> https://www.sqlite.org/src/file/ext/misc/vfslog.c extension into >> SQLite on a workstation and running your code there, to get more >> details about everything that SQLite is doing with storage. > > Thanks for the pointer. I didn't do this exactly, but instead > instrumented SQLite's unixWrite() and my lowest-level NAND driver's > block write function on the target hardware to capture summary > counters of both events. The filesystem's own consistency overhead > adds up to about 2% additional writes - not nearly enough to account > for the missing 2x. The throughput implied by the low-level counters > is consistent with the benchmark results. > What happens if you set "PRAGMA synchronous=OFF". (I'm not suggesting that you do this in production - it is just an experiment to try to help figure out what is going on.) -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Theoretical write performance for low-throughput devices
On Mon, Oct 21, 2019 at 10:31 AM Richard Hipp wrote: > > On 10/21/19, Jonathan Brandmeyer wrote: > > Or, how many times is each page written by SQLite for an insert-heavy > > test? The answer appears to be "4", but I can only account for two of > > those four. > > Are you measuring writes at the OS-interface layer, or writes at the > hardware layer? SQLite should issue no more than 2x writes in WAL > mode, and less if the same page is modified more than once. But who > knows how the OS might be multiplying this to accomplish its own > filesystem consistency. > > Consider compiling the > https://www.sqlite.org/src/file/ext/misc/vfslog.c extension into > SQLite on a workstation and running your code there, to get more > details about everything that SQLite is doing with storage. Thanks for the pointer. I didn't do this exactly, but instead instrumented SQLite's unixWrite() and my lowest-level NAND driver's block write function on the target hardware to capture summary counters of both events. The filesystem's own consistency overhead adds up to about 2% additional writes - not nearly enough to account for the missing 2x. The throughput implied by the low-level counters is consistent with the benchmark results. Thanks, -- Jonathan Brandmeyer PlanetiQ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Theoretical write performance for low-throughput devices
On 10/21/19, Jonathan Brandmeyer wrote: > Or, how many times is each page written by SQLite for an insert-heavy > test? The answer appears to be "4", but I can only account for two of > those four. Are you measuring writes at the OS-interface layer, or writes at the hardware layer? SQLite should issue no more than 2x writes in WAL mode, and less if the same page is modified more than once. But who knows how the OS might be multiplying this to accomplish its own filesystem consistency. Consider compiling the https://www.sqlite.org/src/file/ext/misc/vfslog.c extension into SQLite on a workstation and running your code there, to get more details about everything that SQLite is doing with storage. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Theoretical write performance for low-throughput devices
Or, how many times is each page written by SQLite for an insert-heavy test? The answer appears to be "4", but I can only account for two of those four. I'm working on an embedded system that uses a log-structured filesystem on raw NAND flash. This is not your typical workstation's managed flash (SATA/NVMe), or portable managed flash (SD/USB). It's a bare-nekkid ONFI-speaking chip. All reads and writes are one 2kB page at a time. There is no readahead, and no write buffering by the driver or filesystem for page-sized writes. We got the following performance numbers out of the flash storage: Streaming reads through the filesystem: 7.5 MB/s. Streaming writes through the filesystem: 5.4 MB/s. Single insert performance through SQLite: 0.2 MB/s. Bulk insert performance through SQLIte: 1.3 MB/s, asymptotic for very large transactions. I do expect the single-insert performance to be much lower than the bulk insert performance. We are using the WAL, and this benchmark includes the cost of a checkpoint at the end. Futziing with the WAL autocheckpoint size has little impact for smaller WAL sizes. My working assumption is that using the WAL for an insertion-heavy workload would consume roughly half of my available throughput due to checkpoint writes. Indeed, if the autocheckpoint level is raised high enough that the entire benchmark fits in the WAL, then I do observe that the write throughput asymptotically approaches 2.6 MB/s instead of 1.3 MB/s. That leaves one more factor of two somewhere. The table in question has the schema: ``` CREATE TABLE IF NOT EXISTS `chunks` ( `target_id`INTEGER NOT NULL, `chunk_num`INTEGER NOT NULL, `chunk_blob`BLOB NOT NULL, PRIMARY KEY(`target_id`,`chunk_num`) ); ``` Other factors that might help understand our workload: Blobs are a little less than 1 kB each, and we're using the default DB page size (4 kB). So I would expect that SQLite would pack about 3 rows per page, leaving some extra for primary keys, field delimiters and other metadata. I understand that the composite primary key implies an index table that goes with the blobs, which implies some inherent write amplification to account for the index. Still, my expectation is that the write throughput added by the index should be close to the size of the key columns, not the blob columns. So 2x still seems too high. Any other ideas? Thanks, -- Jonathan Brandmeyer ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users