On Thu, Nov 2, 2017 at 11:46 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Sokolov Yura <funny.fal...@postgrespro.ru> writes: >> [ 0001-Improve-compactify_tuples.patch, v5 or thereabouts ] > > I started to review this patch. I spent a fair amount of time on > beautifying the code, because I found it rather ugly and drastically > undercommented. Once I had it to the point where it seemed readable, > I went to check the shellsort algorithm against Wikipedia's entry, > and found that this appears to be an incorrect implementation of > shellsort: where pg_shell_sort_pass has > > for (_i = off; _i < _n; _i += off) \ > > it seems to me that we need to have > > for (_i = off; _i < _n; _i += 1) \ > > or maybe just _i++. As-is, this isn't h-sorting the whole file, > but just the subset of entries that have multiple-of-h indexes > (ie, the first of the h distinct subfiles that should get sorted). > The bug is masked by the final pass of plain insertion sort, but > we are not getting the benefit we should get from the earlier passes. > > However, I'm a bit dubious that it's worth fixing that; instead > my inclination would be to rip out the shellsort implementation > entirely. The code is only using it for the nitems <= 48 case > (which makes the first three offset steps certainly no-ops) and > I am really unconvinced that it's worth expending the code space > for a shellsort rather than plain insertion sort in that case, > especially when we have good reason to think that the input data > is nearly sorted.
I actually noticed that and benchmarked some variants. Neither made any noticeable difference in performance, so I decided not to complain about them. I guess the same case can be made for removing the shell sort. So I'm inclined to agree. > BTW, the originally given test case shows no measurable improvement > on my box. I did manage to reproduce the original test and got a consistent improvement. > I was eventually able to convince myself by profiling > that the patch makes us spend less time in compactify_tuples, but > this test case isn't a very convincing one. > > So, quite aside from the bug, I'm not excited about committing the > attached as-is. I think we should remove pg_shell_sort and just > use pg_insertion_sort. If somebody can show a test case that > provides a measurable speed improvement from the extra code, > I could be persuaded to reconsider. My tests modifying the shell sort didn't produce any measurable difference, but I didn't test removing it altogether. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers