On Sun, Feb 4, 2018 at 6:10 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Oliver Ford <ojf...@gmail.com> writes: > > [ 0001-window-frame-v13.patch ] > > I've been hacking on this all week (with breaks for release notes) and > have gotten it into a state that I think is close to committable. > > There was quite a lot I didn't like about the patch initially, notably > that the interaction with operator classes/families was done all wrong. > The idea is to add one support function per opclass, not jam them all > into one opclass that breaks every rule for B-tree opclasses. For one > reason, with this approach there's no chance of dealing with non-default > sort orders imposed by non-default opclasses. (As a concrete example, > suppose that we have two btree opclasses for complex numbers, one that > sorts by real part and one that sorts by imaginary part. You can write > a well-defined in_range function for each of these, but one of them has > to increment the real part and the other the imaginary part.) I whacked > that around and also wrote the missing documentation for the API spec > for in_range functions. The path of least resistance was to dump it > into the nbtree/README text file, which I'm not that satisfied with; > probably it should go in the main SGML docs, but I did not find a good > place to put it. > > I also really didn't like the implementation you'd chosen in > nodeWindowAgg.c to scan the entire partition and build an array of peer > group lengths. That risks running OOM with a large partition. Even if > the array doesn't lead to OOM, the tuplestore will spill to disk with > nasty performance consequences. We should try to ensure that the > tuplestore needn't get larger than the frame, so that well-written queries > with narrow frames can execute without spilling to disk. So I rewrote > that using an idea that had been speculated about in the original > comments, but nobody had gotten to yet: add some more read pointers to > track the frame boundaries, and advance them as needed. I'm not really > sure if this ends up as more or few row comparisons than the other way, > but in any case it uses a fixed amount of memory, which is good. > > Also, the last patch's reimplementation of WinGetFuncArgInFrame isn't > right: AFAICS, it results in any "relpos" that would point to a row > in the exclusion range returning the row just after/before that range, > which is already wrong if the exclusion range is more than one row, > plus it doesn't renumber the rows beyond the exclusion. The behavior > we want is that the frame rows surviving after exclusion should appear > consecutively numbered. (This could be exposed with some tests using > nth_value.) I think the attached rewrite gets this right. Also, punting > entirely on the set-mark problem for SEEK_TAIL cases doesn't satisfy me, > for the same reason as above that we don't want the tuplestore to bloat. > What I did below is to set the mark at the frame start, which at least > gives an opportunity for efficient queries. > > I hacked around on various other things too, for instance the behavior > for null values in RANGE mode didn't seem to be per spec. > > I'm reasonably happy with all the code now, though surely it could use > another look by someone else. I've not yet reviewed the docs (other than > the implementor-oriented details I added), nor have I really looked at the > test cases. I do have a couple suggestions on the test cases: for one, > rather than duplicating the same window definition N times in each query, > use one WINDOW clause and reference it with "OVER windowname". Also, > adding a bunch of columns of different types to a single table seems like > a messy and not easily extensible way of testing different data types. > I'd suggest leaving the existing table alone and adding a new test table > per additional data type you want to test, so that there's an easy > template for testing future additions of more in_range support. > > BTW, something I've not done here but am strongly tempted to do is > run around and change all the uses of "RANGE value PRECEDING/FOLLOWING" > terminology to, say, "RANGE offset PRECEDING/FOLLOWING". "value" is > just way too generic a term for this situation, making documentation > confusing, plus you end up contorting sentences to avoid constructions > like "value of the value". I'm not wedded to "offset" if somebody's got a > better word, but let's try to pick something more specific than "value". > (In the ROWS and GROUPS cases, maybe write "count"? Not entirely sure > what to do for text that's trying to address all three cases, though.) > > What about "extent_size" or just "size"? I see the SQL spec refers to "preceding or following size" in an error message: ("data exception — invalid preceding or following size in window function" )
Best regards Pantelis Theodosiou