On Sun, Nov 20, 2016 at 11:54 AM, John Gorman <johngorm...@gmail.com> wrote: > I reviewed the dht-v2.patch and found that it is in excellent shape.
Thanks for reviewing! And sorry for the late reply. > Benchmarking shows that this performs somewhat faster than > dynahash which is surprising because it is doing DSA address > translations on the fly. That is indeed surprising and I think warrants more investigation. > One area where this could run faster is to reduce the amount of > time when the hash is in the RESIZE_IN_PROGRESS state. > > When a hash partition reaches 75% full a resize begins by allocating > an empty new hash bucket array with double the number of buckets. > This begins the RESIZE_IN_PROGRESS state where there is both an > old and a new hash bucket array. > > During the RESIZE state all operations such as find, insert, > delete and iterate run slower due to having to look items up in > two hash bucket arrays. > > Whenever a new item is inserted into the hash and the hash is in > the RESIZE state the current code takes the time to move one item > from the old hash partition to the new hash partition. During > insertion an exclusive lock is already held on the partition so > this is an efficient time to do the resize cleanup work. > > However since we only clean up one item per insertion we do not > complete the cleanup and free the old hash bucket array until we > are due to start yet another resize operation. > > So we are effectively always in the RESIZE state which slows down > operations and wastes some space. Right, that is the case in a workload that inserts a bunch of data but then becomes read-only forever. A workload that constantly does a mix of writing and reading should settle down at a reasonable size and stop resizing. > Here are the timings for insert and find in nanoseconds on a > Macbook Pro. The insert_resize figure includes the resize work to > move one item from the old to the new hash bucket array. > > insert_resize: 945.98 > insert_clean: 450.39 > find_resize: 348.90 > find_clean: 293.16 > > The attached dht-v2-resize-cleanup.patch patch applies on top of > the dht-v2.patch and speeds up the resize cleanup process so that > hashes spend most of their time in the clean state. > > It does this by cleaning up more than one old item during > inserts. This patch cleans up three old items. > > There is also the case where a hash receives all of its inserts > at the beginning and the rest of the work load is all finds. This > patch also cleans up two items for each find. > > The downside of doing extra cleanup early is some additional > latency. Here are the experimental figures and the approximate > formulas for different numbers of cleanups for inserts. Note that > we cannot go lower than one cleanup per insert. > > Resize work in inserts: 729.79 insert + 216.19 * cleanups > 1 945.98 > 2 1178.13 > 3 1388.73 > 4 1617.04 > 5 1796.91 > > Here are the timings for different numbers of cleanups for finds. > Note that since we do not hold an exclusive partition lock on a find > there is the additional overhead of taking that lock. > > Resize work in finds: 348.90 dirty_find + 233.45 lock + 275.78 * cleanups > 0 348.90 > 1 872.88 > 2 1138.98 > 3 1448.94 > 4 1665.31 > 5 1975.91 > > The new effective latencies during the resize vs. the clean states. > > #define DHT_INSERT_CLEANS 3 > #define DHT_SEARCH_CLEANS 2 > > insert_resize: 1388.73 -- 3 cleaned > insert_clean: 450.39 > find_resize: 1138.98 -- 2 cleaned > find_clean: 293.16 > > The overall performance will be faster due to not having to examine > more than one hash bucket array most of the time. Thanks for doing all these experiments. Yeah, I think it makes sense to merge this change to improve that case. Since Dilip Kumar's Parallel Bitmap Heap Scan project is no longer using this, I think I should park it here unless/until another potential use case pops up. Some interesting candidates have been mentioned already, and I'm fairly sure there are other uses too, but I don't expect anyone to be interested in committing this patch until something concrete shows up, so I'll go work on other things until then. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers