Done. The fix is 4.0-only [1] I optimized the ON(..) loop from O(N2) to O(N).
Andrus [1] https://github.com/apache/cayenne/commit/27a99fab4fb06725cba3e71d2a05d9fad129b260 > On Apr 21, 2015, at 9:16 AM, Andrus Adamchik <[email protected]> wrote: > > Looks pretty good to me. I'll run a quick test and commit. > > Thanks, > Andrus > >> On Apr 20, 2015, at 8:39 PM, Øyvind Harboe <[email protected]> wrote: >> >> Hi Andrus, >> >> here's my stab at it. I'm sure it's all sorts of wrong :-) but the >> pathological performance drop on PostgreSQL is gone. >> >> I actually tested this on Cayenne 3 and then copied it over to a patch >> on origin/master. >> >> >> ---------- Forwarded message ---------- >> From: Andrus Adamchik <[email protected]> >> Date: Mon, Apr 20, 2015 at 6:07 PM >> Subject: Re: Problem with catastrophic performance degradation under >> PostgreSQL >> To: [email protected] >> >> >> Hi, >> >> So, Cayenne adds DISTINCT whenever a qualifier includes a to-many >> relationship (for obvious reasons). Since we are dealing with an >> object query, I guess we always know the ID columns and can rewrite >> DISTINCT for PostgreSQL as "DISTINCT ON(id1, id2, ...)" to achieve the >> desired effect. >> >> You want to take a shot at providing a fix for that? >> >> Thanks, >> Andrus >> >> >>> On Apr 20, 2015, at 6:09 PM, Øyvind Harboe <[email protected]> wrote: >>> >>> Q: Would it be a good idea to modify the PostgreSQL Cayenne adapter to use >>> the 'DISTINCT ON()' syntax? >>> >>> I've been testing out PostgreSQL vs. Derby for our application when I ran >>> into a problem where the performance of PostgreSQL went from great to >>> abysmal for no apparent reason. >>> >>> After a bit of digging, I've found that the problem is with the SQL >>> statement that Cayenne generates. >>> >>> Cayenne generates statements of the following form which yields bad >>> performance on PostgreSQL with complicated WHERE statements and numerous >>> columns: >>> >>> 1) SELECT DISTINCT a,b,c,d,e,f ... WHERE somecomplicatedstatement >>> >>> If I rewrite this statement to the form below using the 'DISTINCT ON()' >>> syntax(which is PostgreSQL specific dialect), then I get great performance >>> again: >>> >>> 2) SELECT DISTINCT ON(a) a,b,c,d,e,f ... WHERE somecomplicatedstatement >>> >>> >>> >>> >>> Numbers on my machine: >>> >>> 1) 44000ms >>> >>> 2) 4300ms >>> >>> Here's where I read up on the DISTINCT ON syntax: >>> http://www.postgresql.org/docs/9.4/static/sql-select.html >>> >>> >>> >>> -- >>> Øyvind Harboe - Can Zylin Consulting help on your project? >>> http://www.zylin.com/ >> >> >> >> >> -- >> Øyvind Harboe - Can Zylin Consulting help on your project? >> http://www.zylin.com/ >> <0001-postgresql-improves-performance-by-taking-advantage-.patch> > >
