Ok, it appears that our implementation was too naive: https://issues.apache.org/jira/browse/CAY-2005
DISTINCT ON in PostgreSQL is tied to the ORDER BY clause. We need to find a better solution. Andrus > On Apr 21, 2015, at 1:18 PM, Øyvind Harboe <[email protected]> wrote: > > Nice. > > Thanks! > > Our app is stuck on Cayenne 3 until we can pay off some technical debt. > > On Tue, Apr 21, 2015 at 11:55 AM, Andrus Adamchik > <[email protected]> wrote: >> Done. The fix is 4.0-only [1] I optimized the ON(..) loop from O(N2) to O(N). >> >> Andrus >> >> [1] >> https://github.com/apache/cayenne/commit/27a99fab4fb06725cba3e71d2a05d9fad129b260 >> >> >>> On Apr 21, 2015, at 9:16 AM, Andrus Adamchik <[email protected]> wrote: >>> >>> Looks pretty good to me. I'll run a quick test and commit. >>> >>> Thanks, >>> Andrus >>> >>>> On Apr 20, 2015, at 8:39 PM, Øyvind Harboe <[email protected]> wrote: >>>> >>>> Hi Andrus, >>>> >>>> here's my stab at it. I'm sure it's all sorts of wrong :-) but the >>>> pathological performance drop on PostgreSQL is gone. >>>> >>>> I actually tested this on Cayenne 3 and then copied it over to a patch >>>> on origin/master. >>>> >>>> >>>> ---------- Forwarded message ---------- >>>> From: Andrus Adamchik <[email protected]> >>>> Date: Mon, Apr 20, 2015 at 6:07 PM >>>> Subject: Re: Problem with catastrophic performance degradation under >>>> PostgreSQL >>>> To: [email protected] >>>> >>>> >>>> Hi, >>>> >>>> So, Cayenne adds DISTINCT whenever a qualifier includes a to-many >>>> relationship (for obvious reasons). Since we are dealing with an >>>> object query, I guess we always know the ID columns and can rewrite >>>> DISTINCT for PostgreSQL as "DISTINCT ON(id1, id2, ...)" to achieve the >>>> desired effect. >>>> >>>> You want to take a shot at providing a fix for that? >>>> >>>> Thanks, >>>> Andrus >>>> >>>> >>>>> On Apr 20, 2015, at 6:09 PM, Øyvind Harboe <[email protected]> >>>>> wrote: >>>>> >>>>> Q: Would it be a good idea to modify the PostgreSQL Cayenne adapter to use >>>>> the 'DISTINCT ON()' syntax? >>>>> >>>>> I've been testing out PostgreSQL vs. Derby for our application when I ran >>>>> into a problem where the performance of PostgreSQL went from great to >>>>> abysmal for no apparent reason. >>>>> >>>>> After a bit of digging, I've found that the problem is with the SQL >>>>> statement that Cayenne generates. >>>>> >>>>> Cayenne generates statements of the following form which yields bad >>>>> performance on PostgreSQL with complicated WHERE statements and numerous >>>>> columns: >>>>> >>>>> 1) SELECT DISTINCT a,b,c,d,e,f ... WHERE somecomplicatedstatement >>>>> >>>>> If I rewrite this statement to the form below using the 'DISTINCT ON()' >>>>> syntax(which is PostgreSQL specific dialect), then I get great performance >>>>> again: >>>>> >>>>> 2) SELECT DISTINCT ON(a) a,b,c,d,e,f ... WHERE somecomplicatedstatement >>>>> >>>>> >>>>> >>>>> >>>>> Numbers on my machine: >>>>> >>>>> 1) 44000ms >>>>> >>>>> 2) 4300ms >>>>> >>>>> Here's where I read up on the DISTINCT ON syntax: >>>>> http://www.postgresql.org/docs/9.4/static/sql-select.html >>>>> >>>>> >>>>> >>>>> -- >>>>> Øyvind Harboe - Can Zylin Consulting help on your project? >>>>> http://www.zylin.com/ >>>> >>>> >>>> >>>> >>>> -- >>>> Øyvind Harboe - Can Zylin Consulting help on your project? >>>> http://www.zylin.com/ >>>> <0001-postgresql-improves-performance-by-taking-advantage-.patch> >>> >>> >> > > > > -- > Øyvind Harboe - Can Zylin Consulting help on your project? > http://www.zylin.com/ >
