On Mon, Nov 21, 2016 at 1:59 PM, Kouhei Kaigai <kai...@ak.jp.nec.com> wrote:
> Hello, > > The attached patch is a revised version of pass-down LIMIT to FDW/CSP. > > Below is the updates from the last version. > > 'ps_numTuples' of PlanState was declared as uint64, instead of long > to avoid problems on 32bits machine when a large LIMIT clause is > supplied. > > 'ps_numTuples' is re-interpreted; 0 means that its upper node wants > to fetch all the tuples. It allows to eliminate a boring initialization > on ExecInit handler for each executor node. > > Even though it was not suggested, estimate_path_cost_size() of postgres_fdw > adjusts number of rows if foreign-path is located on top-level of > the base-relations and LIMIT clause takes a constant value. > It will make more adequate plan as follows: > > * WITHOUT this patch > -------------------- > postgres=# explain verbose select * from t_a, t_b where t_a.id = t_b.id > and t_a.x < t_b.x LIMIT 100; > QUERY PLAN > ------------------------------------------------------------ > ---------------------------- > Limit (cost=261.17..274.43 rows=100 width=88) > Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y > -> Hash Join (cost=261.17..581.50 rows=2416 width=88) > Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y > Hash Cond: (t_a.id = t_b.id) > Join Filter: (t_a.x < t_b.x) > -> Foreign Scan on public.t_a (cost=100.00..146.12 rows=1204 > width=44) > Output: t_a.id, t_a.x, t_a.y > Remote SQL: SELECT id, x, y FROM public.t > -> Hash (cost=146.12..146.12 rows=1204 width=44) > Output: t_b.id, t_b.x, t_b.y > -> Foreign Scan on public.t_b (cost=100.00..146.12 > rows=1204 width=44) > Output: t_b.id, t_b.x, t_b.y > Remote SQL: SELECT id, x, y FROM public.t > (14 rows) > > * WITH this patch > ----------------- > postgres=# explain verbose select * from t_a, t_b where t_a.id = t_b.id > and t_a.x < t_b.x LIMIT 100; > > QUERY PLAN > ------------------------------------------------------------ > -------------------------------------------------- > Limit (cost=100.00..146.58 rows=100 width=88) > Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y > -> Foreign Scan (cost=100.00..146.58 rows=100 width=88) > Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y > Relations: (public.t_a) INNER JOIN (public.t_b) > Remote SQL: SELECT r1.id, r1.x, r1.y, r2.id, r2.x, r2.y FROM > (public.t r1 INNER JOIN public.t r2 ON (((r1.x < r2.x)) AND ((r1.id = > r2.id)))) > (6 rows) > > > That's nice. > On the other hands, I noticed it is not safe to attach LIMIT clause at > the planner stage because root->limit_tuples is declared as double. > Even if LIMIT clause takes a constant value, it is potentially larger > than 2^53 which is the limitation we can represent accurately with > float64 data type but LIMIT clause allows up to 2^63-1. > So, postgres_fdw now attaches LIMIT clause on the remote query on > execution time only. > I think, it's OK. Here are few comments on latest patch: 1. make/make check is fine, however I am getting regression failure in postgres_fdw contrib module (attached regression.diff). Please investigate and fix. 2. + * + * MEMO: root->limit_tuples is not attached when query contains + * grouping-clause or aggregate functions. So, we don's adjust + * rows even if LIMIT <const> is supplied. Can you please explain why you are not doing this for grouping-clause or aggregate functions. 3. Typo: don's => don't Rest of the changes look good to me. Thanks > Thanks, > ---- > PG-Strom Project / NEC OSS Promotion Center > KaiGai Kohei <kai...@ak.jp.nec.com> > > > > -----Original Message----- > > From: Robert Haas [mailto:robertmh...@gmail.com] > > Sent: Thursday, November 10, 2016 3:08 AM > > To: Kaigai Kouhei(海外 浩平) <kai...@ak.jp.nec.com> > > Cc: pgsql-hackers@postgresql.org; Jeevan Chalke > > <jeevan.cha...@enterprisedb.com>; Etsuro Fujita > > <fujita.ets...@lab.ntt.co.jp>; Andres Freund <and...@anarazel.de> > > Subject: ##freemail## Re: PassDownLimitBound for ForeignScan/CustomScan > > [take-2] > > > > On Mon, Oct 31, 2016 at 10:20 AM, Kouhei Kaigai <kai...@ak.jp.nec.com> > > wrote: > > > As an example, I enhanced postgres_fdw to understand the ps_numTuples > > > if it is set. If and when remote ORDER BY is pushed down, the latest > > > code tries to sort the entire remote table because it does not know > > > how many rows to be returned. Thus, it took larger execution time. > > > On the other hands, the patched one runs the remote query with LIMIT > > > clause according to the ps_numTuples; which is informed by the Limit > > > node on top of the ForeignScan node. > > > > So there are two cases here. If the user says LIMIT 12, we could in > theory > > know that at planner time and optimize accordingly. If the user says > LIMIT > > twelve(), however, we will need to wait until execution time unless > twelve() > > happens to be capable of being simplified to a constant by the planner. > > > > Therefore, it's possible to imagine having two mechanisms here. In the > > simple case where the LIMIT and OFFSET values are constants, we could > > implement a system to get hold of that information during planning and > > use it for whatever we like. In addition, we can have an > > execution-time system that optimizes based on values available at > execution > > (regardless of whether those values were also available during planning). > > Those are, basically, two separate things, and this patch has enough to > > do just focusing on one of them. > > > > -- > > Robert Haas > > EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL > > Company > -- Jeevan Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
regression.diffs
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers