On 18 August 2014 09:05, Heikki Linnakangas <hlinnakan...@vmware.com> wrote:
> On 08/17/2014 07:15 PM, Anastasia Lubennikova wrote: > >> 2014-08-07 0:30 GMT+04:00 Heikki Linnakangas <hlinnakan...@vmware.com>: >> >> * I'm getting two regression failures with this (opr_sanity and join). >> >>> >>> >> opr_sanity failure is corrected. >> But there is remain question with join. >> I check the latest version of my github repo and there's no fail in join >> regression test >> All 145 tests passed. >> To tell the truth, I don't understand which changes could led to this >> failure. >> Could you show me regression diffs? >> > > Sure, here you go. It seems like a change in a plan. At a quick glance it > seems harmless: the new plan is identical except that the left and right > side of a join have been reversed. But I don't understand either why this > patch would change that, so it needs to be investigated. > > * The regression test queries that use LIMIT are not guaranteed to always >> >>> return the same rows, hence they're not very good regression test cases. >>> I'd suggest using more restricting WHERE clauses, so that each query only >>> returns a handful of rows. >>> >> >> Thank you for comment, I rewrote wrong queries. But could you explain why >> LIMIT queries may return different results? Is it happens because of >> different query optimization? >> > > Imagine that you have a table with two rows, A and B. If you run a query > like "SELECT * FROM table LIMIT 1", the system can legally return either > row A or B, because there's no ORDER BY. > > Now, admittedly you have a similar problem even without the LIMIT - the > system can legally return the rows in either order - but it's less of an > issue because at least you can quickly see from the diff that the result > set is in fact the same, the rows are just in different order. You could > fix that by adding an ORDER BY to all test queries, but we haven't done > that in the regression suite because then we would not have any test > coverage for cases where you don't have an ORDER BY. As a compromise, test > cases are usually written without an ORDER BY, but if e.g. the buildfarm > starts failing because of differences in the result set order across > platforms, then we add an ORDER BY to make it stable. > > * I think it's leaking memory, in GIST scan context. I tested this with a >> >>> variant of the regression tests: >>> >>> insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i, >>> 0.05*i)), >>> point(0.05*i, 0.05*i) FROM generate_series(0, >>> 10000000) as i; >>> CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p); >>> >>> set enable_seqscan=off; >>> set enable_bitmapscan=off; >>> >>> explain analyze select p from gist_tbl where p <@ box(point(0,0), >>> point(9999999,9999999)) and length(p::text) < 10; >>> >>> while the final query runs, 'top' shows constantly increasing memory >>> usage. >>> >> >> I don't think it's memory leak. After some increasing, memory using remain >> the same. It works similar without using indexonlyscan. >> > > No, it's definitely a leak caused by the patch. Test with the attached > patch, which prints out to the server log the amount of memory used by the > GiST scan memory context every 10000 rows. It clearly shows increasing > memory usage all the way to the end of the query. > > It's cleaned up at the end of the query, but that's not good enough > because for a large query you might accumulate gigabytes of leaked memory > until the query has finished. If you (manually) apply the same patch to git > master, you'll see that the memory usage stays consistent and small. > Hi Anastasia, Do you have time to address the issues brought up in Heikki's review? It would be good if we could your work into PostgreSQL 9.5. Thanks Thom