Re: [sqlite] Causal profiling
On Wed, Dec 25, 2019 at 4:25 PM Richard Hipp wrote: > > The video provided details on what they did. I could not find any > performance improvement by making mutexes direct calls instead of > indirect calls. This kind of optimization improves performance by reducing pressure on the CPU's branch branch-target address caching and prediction. > Maybe they are measuring performance differently than > me. (I use CPU cycle counts measured by valgrind.) > By default, valgrind doesn't model either branch predictors or branch-target address caches. Its model is somewhat primitive[0], but it is available through command-line option `--branch-sim` [1,2]. When you performance tested this change, did you enable that option? tool/run-speed-test.sh certainly doesn't. Of course, since those structures are shared globally, the performance benefit for de-virtualizing any given function call is highly context dependent. Sincerely, -- Jonathan Brandmeyer [0]: https://valgrind.org/docs/manual/cg-manual.html#branch-sim [1]: https://www.valgrind.org/docs/manual/cl-manual.html#cl-manual.options.simulation [2]: https://valgrind.org/docs/manual/cg-manual.html#cg-manual.cgopts ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Theoretical write performance for low-throughput devices
On Mon, Oct 21, 2019 at 5:47 PM Richard Hipp wrote: > > On 10/21/19, Jonathan Brandmeyer wrote: > > > > No significant change. The target filesystem only caches non-aligned > > writes, so there usually isn't anything for it to do on fsync anyway. > > > > OK. I don't have any more ideas at the moment, and without access to > your code, and your platform, I can't really debug it. But do please > know that you should only be getting less than 2x writes. I suggest > adding instrumentation and trying to come up with a simplified test > case. Should the vLogVfs be picking up writes to both the WAL and the database proper? I'm seeing a number of WRITE's in the log equal to the final number of pages plus a small percentage. For the case where the WAL is much larger than the benchmark, the number of WRITE's in the log is almost equal to the number of final pages in the database. Thanks, -- Jonathan Brandmeyer PlanetiQ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Theoretical write performance for low-throughput devices
On Tue, Oct 22, 2019 at 2:03 AM Wout Mertens wrote: > This probably won't change a thing, but I wonder why you wouldn't set the > sqlite page size to 2KB? Hopefully this means lots of aligned writes. At one point, the row blobs were exactly 1024 bytes. This isn't great for 4kB pages, but it is a particularly unfavorable row size for 2 kB database pages. -- Jonathan Brandmeyer ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Theoretical write performance for low-throughput devices
On Mon, Oct 21, 2019 at 5:00 PM Richard Hipp wrote: > > On 10/21/19, Jonathan Brandmeyer wrote: > > On Mon, Oct 21, 2019 at 10:31 AM Richard Hipp wrote: > >> > >> On 10/21/19, Jonathan Brandmeyer wrote: > >> > Or, how many times is each page written by SQLite for an insert-heavy > >> > test? The answer appears to be "4", but I can only account for two of > >> > those four. > >> > >> Are you measuring writes at the OS-interface layer, or writes at the > >> hardware layer? SQLite should issue no more than 2x writes in WAL > >> mode, and less if the same page is modified more than once. But who > >> knows how the OS might be multiplying this to accomplish its own > >> filesystem consistency. > >> > >> Consider compiling the > >> https://www.sqlite.org/src/file/ext/misc/vfslog.c extension into > >> SQLite on a workstation and running your code there, to get more > >> details about everything that SQLite is doing with storage. > > > > Thanks for the pointer. I didn't do this exactly, but instead > > instrumented SQLite's unixWrite() and my lowest-level NAND driver's > > block write function on the target hardware to capture summary > > counters of both events. The filesystem's own consistency overhead > > adds up to about 2% additional writes - not nearly enough to account > > for the missing 2x. The throughput implied by the low-level counters > > is consistent with the benchmark results. > > > > What happens if you set "PRAGMA synchronous=OFF". No significant change. The target filesystem only caches non-aligned writes, so there usually isn't anything for it to do on fsync anyway. > (I'm not suggesting > that you do this in production - it is just an experiment to try to > help figure out what is going on.) Understood. If it helps, we are doing a couple of other things that are dissimilar to a typical workstation configuration. We are using the builtin "unix-none" VFS, the -DSQLITE_DEFAULT_WAL_SYNCHRONOUS=1 and -DSQLITE_DEFAULT_LOCKING_MODE=1 compile-time options, along with a design rules that there are zero or one database connection per file, and only one thread accesses that connection at a time. The target lacks mmap(), posix file locks, and multiple address spaces, so this seemed like the right settings to use. Thanks, -- Jonathan Brandmeyer PlanetiQ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Theoretical write performance for low-throughput devices
On Mon, Oct 21, 2019 at 10:31 AM Richard Hipp wrote: > > On 10/21/19, Jonathan Brandmeyer wrote: > > Or, how many times is each page written by SQLite for an insert-heavy > > test? The answer appears to be "4", but I can only account for two of > > those four. > > Are you measuring writes at the OS-interface layer, or writes at the > hardware layer? SQLite should issue no more than 2x writes in WAL > mode, and less if the same page is modified more than once. But who > knows how the OS might be multiplying this to accomplish its own > filesystem consistency. > > Consider compiling the > https://www.sqlite.org/src/file/ext/misc/vfslog.c extension into > SQLite on a workstation and running your code there, to get more > details about everything that SQLite is doing with storage. Thanks for the pointer. I didn't do this exactly, but instead instrumented SQLite's unixWrite() and my lowest-level NAND driver's block write function on the target hardware to capture summary counters of both events. The filesystem's own consistency overhead adds up to about 2% additional writes - not nearly enough to account for the missing 2x. The throughput implied by the low-level counters is consistent with the benchmark results. Thanks, -- Jonathan Brandmeyer PlanetiQ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Theoretical write performance for low-throughput devices
Or, how many times is each page written by SQLite for an insert-heavy test? The answer appears to be "4", but I can only account for two of those four. I'm working on an embedded system that uses a log-structured filesystem on raw NAND flash. This is not your typical workstation's managed flash (SATA/NVMe), or portable managed flash (SD/USB). It's a bare-nekkid ONFI-speaking chip. All reads and writes are one 2kB page at a time. There is no readahead, and no write buffering by the driver or filesystem for page-sized writes. We got the following performance numbers out of the flash storage: Streaming reads through the filesystem: 7.5 MB/s. Streaming writes through the filesystem: 5.4 MB/s. Single insert performance through SQLite: 0.2 MB/s. Bulk insert performance through SQLIte: 1.3 MB/s, asymptotic for very large transactions. I do expect the single-insert performance to be much lower than the bulk insert performance. We are using the WAL, and this benchmark includes the cost of a checkpoint at the end. Futziing with the WAL autocheckpoint size has little impact for smaller WAL sizes. My working assumption is that using the WAL for an insertion-heavy workload would consume roughly half of my available throughput due to checkpoint writes. Indeed, if the autocheckpoint level is raised high enough that the entire benchmark fits in the WAL, then I do observe that the write throughput asymptotically approaches 2.6 MB/s instead of 1.3 MB/s. That leaves one more factor of two somewhere. The table in question has the schema: ``` CREATE TABLE IF NOT EXISTS `chunks` ( `target_id`INTEGER NOT NULL, `chunk_num`INTEGER NOT NULL, `chunk_blob`BLOB NOT NULL, PRIMARY KEY(`target_id`,`chunk_num`) ); ``` Other factors that might help understand our workload: Blobs are a little less than 1 kB each, and we're using the default DB page size (4 kB). So I would expect that SQLite would pack about 3 rows per page, leaving some extra for primary keys, field delimiters and other metadata. I understand that the composite primary key implies an index table that goes with the blobs, which implies some inherent write amplification to account for the index. Still, my expectation is that the write throughput added by the index should be close to the size of the key columns, not the blob columns. So 2x still seems too high. Any other ideas? Thanks, -- Jonathan Brandmeyer ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Should SQLite distinguish between +0.0 and -0.0 on output?
Regarding additional uses for the sign of zero: Branch Cuts for Complex Elementary Functions, or Much Ado About Nothing's Sign Bit in The State of the Art in Numerical Analysis, (eds. Iserles and Powell), Clarendon Press, Oxford, 1987. https://people.freebsd.org/~das/kahan86branch.pdf On Wed, Jun 12, 2019 at 4:40 PM Richard Hipp wrote: > > On 6/12/19, James K. Lowden wrote: > > what kind of computation > > would lead to a value in memory representing -0,0? > > 0.0/-1.0 > > -- > D. Richard Hipp > d...@sqlite.org > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users -- Jonathan Brandmeyer Vice President of Software Engineering PlanetiQ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Should SQLite distinguish between +0.0 and -0.0 on output?
On Wed, Jun 12, 2019 at 12:02 PM David Raymond wrote: > https://www.sqlite.org/fileformat2.html#record_format > > The storage type of each record is given by an integer. And in the current > format, all non-negative integers are used. > Ouch. Yes, an additional data type was closer to what I had in mind. > -Original Message- > From: sqlite-users On > Behalf Of Thomas Kurz > Sent: Wednesday, June 12, 2019 1:05 PM > To: SQLite mailing list > Subject: Re: [sqlite] Should SQLite distinguish between +0.0 and -0.0 on > output? > > > It would also be a file format change, rendering about 1e12 existing > database files obsolete. > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Should SQLite distinguish between +0.0 and -0.0 on output?
IMO, when acting as a storage engine, SQLite should be good to the last bit. The sign of zero, the least significant bit of any mantissa, and the payload bits of NaN should all be returned exactly as they were stored. The history of almost-but-not-quite-IEEE754 arithmetic has been convergence towards fully-IEEE754 pretty much everywhere. People who are serious about floating-point rely on its features. Even though serious users are quite a bit smaller than the community of casual floating-point users, the maintenance burden of maintaining two sets of semantics means that you're better off just supporting IEEE754. For an SQL engine, the next-best-thing to strict binary IEEE754 is not sloppy binary IEEE754, its probably strict decimal IEEE754. On Wed, Jun 12, 2019 at 7:35 AM Richard Hipp wrote: > IEEE754 floating point numbers have separate representations for +0.0 > and -0.0. As currently implemented, SQLite always display both > quantities as just "0.0". > > Question: Should SQLite be enhanced to show -0.0 as "-0.0"? Or, > would that create unnecessary confusion? > > -- > D. Richard Hipp > d...@sqlite.org > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > -- Jonathan Brandmeyer Vice President of Software Engineering PlanetiQ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Please help me fix the SQLite Git mirror
On Mon, Apr 22, 2019 at 12:22 PM Richard Hipp wrote: > But before I proceed, I would like to better understand how rewiring > the refs this way constitutes "changing history". The refs/heads > entries are all ephemeral - they are constantly changing on their own, > and no historical record of their past values is retained. So if I > modify the refs to synchronize with the canonical Fossil repository, > how is that changing history, exactly? > Certainly no new SHA's were created, so this is much less obvious of a re-write than if you had performed a rebase of some kind. Nonetheless, I claim that this constitutes rewriting history because it has a similar impact to downstream users. Some user-visible symptoms, after a user had already synchronized to the master which was later abandoned: - From a context of master, `git pull` alone would construct a merge commit between the abandoned branch and the new master. `git pull --ff-only` would fail. - From a context of a custom patch series, `git rebase master` has unexpected effects, in that it also rebases the mistake you tried to orphan. - `git fetch` shows a forced update to origin/master. - A user who was using a merge-based workflow and had merged to your mistake branch would have a rough time following the change in branch name. One method would be to construct a new merge to the newly corrected master and then rebase any of their subsequent changes onto the new merge commit. Their workflow is no longer strictly merge-based and they still have to deal with the impacts of re-writing their history. Alternatively, they could construct the inverse of the mistake via `git revert` onto their own working branch and then merge again against the new master. These user-visible impacts and the recovery actions are almost the same as what a Git user would see if you had initially constructed (A, B, C, D) and re-written it to be (A, C', D') instead via a rebase. IMO, the proper corrective action after pushing the commit with a mistake in it would have been to commit the inverse of the mistake and then merge it to the alternate path. Yes, it would have constructed a merge commit in the history, which is unfortunate when you are trying to maintain a clean and linear history. But the impact to downstream users would have been negligible. `git pull --ff-only` would have Just Worked, `git rebase master` from a patch series would have Just Worked, and a merge-based workflow would have Just Worked, too. -- Jonathan Brandmeyer ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Please help me fix the SQLite Git mirror
``` # Construct the matching branch name git branch mistake 9b888fc # Push the name alone to the remote git push -u origin mistake # Move the name of master git checkout master && git reset --hard # Push the new name of master git push --force ``` Git reset --hard will move the name of the current working branch to another branch SHA, which is why you need to first check out the branch being moved: Its context sensitive. You are re-writing history, though. It shouldn't construct any new SHA's, but the impact on a downstream user's workflow is rough. Once it got published to public git the least impactful way forward would be to construct the inverse of the mistake and push that as its own commit instead of orphaning it. `git revert` does this in git-land. If I'm maintaining some patches against your master, then my normal workflow might be to rebase them against the current master every once in a while, with just `git rebase master`. If I did that once to rebase against the SHA which was is currently named `master`, and then invoke `git rebase master` again after your change to history, then the second rebase will also attempt to rebase your mistake onto the corrected master. User's would need to perform a one-time `git rebase --onto master mistake ` instead. -- Jonathan Brandmeyer ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Stack allocation upper bound with SQLITE_USE_ALLOCA
On Wed, Apr 3, 2019 at 8:55 AM Richard Hipp wrote: > > On 4/3/19, Jonathan Brandmeyer wrote: > > What is the upper bound for stack consumption under the > > SQLITE_USE_ALLOCA compile-time option? I see that there are a number > > of configurable size limits available as compile-time and/or run-time > > options. Which ones affect the maximum alloca? > > > > I think the maximum alloca() allocation will be 7x the page size for > the database file. So a little less than 0.5 MB assuming a maximum > page size of 64K. Thanks! -- Jonathan Brandmeyer ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Stack allocation upper bound with SQLITE_USE_ALLOCA
What is the upper bound for stack consumption under the SQLITE_USE_ALLOCA compile-time option? I see that there are a number of configurable size limits available as compile-time and/or run-time options. Which ones affect the maximum alloca? Thanks, Jonathan Brandmeyer ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users