On Sun, Sep 11, 2016 at 11:05 AM, Peter Geoghegan <p...@heroku.com> wrote: > So, while there are still a few loose ends with this revision (it > should still certainly be considered WIP), I wanted to get a revision > out quickly because V1 has been left to bitrot for too long now, and > my schedule is very full for the next week, ahead of my leaving to go > on vacation (which is long overdue). Hopefully, I'll be able to get > out a third revision next Saturday, on top of the > by-then-presumably-committed new tape batch memory patch from Heikki, > just before I leave. I'd rather leave with a patch available that can > be cleanly applied, to make review as easy as possible, since it > wouldn't be great to have this V2 with bitrot for 10 days or more.
Heikki committed his preload memory patch a little later than originally expected, 4 days ago. I attach V3 of my own parallel CREATE INDEX patch, which should be applied on top of a today's git master (there is a bugfix that reviewers won't want to miss -- commit b56fb691). I have my own test suite, and have to some extent used TDD for this patch, so rebasing was not so bad . My tests are rather rough and ready, so I'm not going to post them here. (Changes in the WaitLatch() API also caused bitrot, which is now fixed.) Changes from V2: * Since Heikki eliminated the need for any extra memtuple "slots" (memtuples is now only exactly big enough for the initial merge heap), an awful lot of code could be thrown out that managed sizing memtuples in the context of the parallel leader (based on trends seen in parallel workers). I was able to follow Heikki's example by eliminating code for parallel sorting memtuples sizing. Throwing this code out let me streamline a lot of stuff within tuplesort.c, which is cleaned up quite a bit. * Since this revision was mostly focused on fixing up logtape.c (rebasing on top of Heikki's work), I also took the time to clarify some things about how an block-based offset might need to be applied within the leader. Essentially, outlining how and where that happens, and where it doesn't and shouldn't happen. (An offset must sometimes be applied to compensate for difference in logical BufFile positioning (leader/worker differences) following leader's unification of worker tapesets into one big tapset of its own.) * max_parallel_workers_maintenance now supersedes the use of the new parallel_workers index storage parameter. This matches existing heap storage parameter behavior, and allows the implementation to add practically no cycles as compared to master branch when the use of parallelism is disabled by setting max_parallel_workers_maintenance to 0. * New additions to the chapter in the documentation that Robert added a little while back, "Chapter 15. Parallel Query". It's perhaps a bit of a stretch to call this feature part of parallel query, but I think that it works reasonably well. The optimizer does determine the number of workers needed here, so while it doesn't formally produce a query plan, I think the implication that it does is acceptable for user-facing documentation. (Actually, it would be nice if you really could EXPLAIN utility commands -- that would be a handy place to show information about how they were executed.) Maybe this new documentation describes things in what some would consider to be excessive detail for users. The relatively detailed information added on parallel sorting seemed to be in the pragmatic spirit of the new chapter 15, so I thought I'd see what people thought. Work is still needed on: * Cost model. Should probably attempt to guess final index size, and derive calculation of number of workers from that. Also, I'm concerned that I haven't given enough thought to the low end, where with default settings most CREATE INDEX statements will use at least one parallel worker. * The whole way that I teach nbtsort.c to disallow catalog tables for parallel CREATE INDEX due to concerns about parallel safety is in need of expert review, preferably from Robert. It's complicated in a way that relies on things happening or not happening from a distance. * Heikki seems to want to change more about logtape.c, and its use of indirection blocks. That may evolve, but for now I can only target the master branch. * More extensive performance testing. I think that this V3 is probably the fastest version yet, what with Heikki's improvements, but I haven't really verified that. -- Peter Geoghegan
0001-Cap-the-number-of-tapes-used-by-external-sorts.patch.gz
Description: GNU Zip compressed data
0002-Add-parallel-B-tree-index-build-sorting.patch.gz
Description: GNU Zip compressed data
0003-Add-force_btree_randomaccess-GUC-for-testing.patch.gz
Description: GNU Zip compressed data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers