Re: JIT doing duplicative optimization?
Hi, On 2021-11-14 18:46:34 -0500, Tom Lane wrote: > BTW, I realized while working on this that I have little idea what the > "Functions:" count is. Nor does our documentation explain that (or any > other of these numbers), at least not anywhere I could find. That seems > like a pretty serious documentation fail. It's literally the number of functions created in the JIT code. That could be functions for evaluating one expression, or functions for deforming tuples of one type of tupledesc. It's the simplest measure of "amount of JITed stuff" I could find. I'd proposed a more detailed explain option where we'd optionally print details about what exactly is JITed where a while back, but there were some explain output format challenges... > If these numbers aren't important enough to explain in the docs, why are we > printing them at all? Afaict we don't have detailed docs for EXPLAIN output in general? Greetings, Andres Freund
Re: JIT doing duplicative optimization?
Hi, On 2021-11-11 17:02:09 -0300, Alvaro Herrera wrote: > Recently I noticed extremely suspicious behavior from queries under JIT. > A query (details for which sadly I cannot disclose) had an EXPLAIN > looking like this: > > JIT: >Functions: 743 >Options: Inlining true, Optimization true, Expressions true, Deforming true >Timing: Generation 47.013 ms, Inlining 20.277 ms, Optimization 3269.381 > ms, Emission 2392.115 ms, Total 5728.786 ms > Execution Time: 5735.552 ms > > Disabling JIT, the query takes a few tens of milliseconds, so of course > the typical recommendation is to disable JIT. But I noted that the > average time to optimize, per function, is 4.7ms. I didn't know what to > make of this -- is it a reasonable to to take to optimize a single > function? No, that's pretty insane. Any chance you share the generated bitcode for this - it'll have some details about the query / table structure though. Perhaps just the contained function names? > I put that thought aside. But then I spotted a couple of minor things > in the query that could be made better; so I did some of that, and > noticed that the number of functions was reduced from 743 to 671, and > the JIT details in the explain looked like this: > But what really surprised me is that the the average time to optimize > per function is now 2.06ms ... less than half of the previous > measurement. It emits 10% less functions than before, but the time to > both optimize and emit is reduced by 50%. How does that make sense? Without knowing the concrete query I don't know if it it's that. But > I don't know how to approach an investigation into this, but I thought > that maybe we're doing quadratic effort in the optimization / emission > phase or something like that. Yea. One way to investigate it is to enable jit_dump_bitcode and then use llvm's 'opt' tool to see details about optimization times. opt -time-passes -O3 -o /dev/null < /srv/dev/pgdev-dev/3990041.77.bc |less Greetings, Andres Freund
Re: JIT doing duplicative optimization?
I wrote: > You could probably generate some queries with lots and lots of expressions > to characterize this better. If it is O(N^2), it should not be hard to > drive the cost up to the point where the guilty bit of code would stand > out in a perf trace. I experimented with that, using a few different-size queries generated like this: print "explain analyze\n"; for (my $i = 1; $i < 100; $i++) { print " select sum(f1+$i) from base union all\n"; } print "select sum(f1+0) from base;\n"; on a table made like create table base as select generate_series(1,1000) f1; What I got, after setting max_parallel_workers_per_gather = 0, was 10 subqueries: Planning Time: 0.260 ms JIT: Functions: 30 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 4.651 ms, Inlining 8.870 ms, Optimization 152.937 ms, Emis sion 95.046 ms, Total 261.504 ms Execution Time: 15258.249 ms 100 subqueries: Planning Time: 2.231 ms JIT: Functions: 300 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 44.163 ms, Inlining 9.934 ms, Optimization 1448.971 ms, Em ission 928.438 ms, Total 2431.506 ms Execution Time: 154815.515 ms 1000 subqueries: Planning Time: 29.480 ms JIT: Functions: 3000 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 444.479 ms, Inlining 25.688 ms, Optimization 14989.696 ms, Emission 9891.993 ms, Total 25351.856 ms Execution Time: 1522011.367 ms So the overhead looks pretty linear, or even a shade sublinear for the "inlining" bit, *as long as only one process is involved*. However, I noted that if I didn't force that, the JIT overhead went up because the planner wanted to use more workers and each worker has to do its own compilations. So perhaps the apparent nonlinearity in your examples comes from that? BTW, I realized while working on this that I have little idea what the "Functions:" count is. Nor does our documentation explain that (or any other of these numbers), at least not anywhere I could find. That seems like a pretty serious documentation fail. If these numbers aren't important enough to explain in the docs, why are we printing them at all? regards, tom lane
Re: JIT doing duplicative optimization?
Alvaro Herrera writes: > On 2021-Nov-11, Alvaro Herrera wrote: >> But what really surprised me is that the the average time to optimize >> per function is now 2.06ms ... less than half of the previous >> measurement. It emits 10% less functions than before, but the time to >> both optimize and emit is reduced by 50%. How does that make sense? > Ah, here's a query that illustrates what I'm on about. I found this > query[1] in a blog post[2]. > ... > Query 1, 148 functions JIT-compiled. > Average time to optimize, per function 435.153/148 = 2.940ms; > average time to emit per function 282.216/148 = 1.906ms > Query 2, 137 functions JIT-compiled. > Average time to optimize, per function: 374.103/137 = 2.730ms > Average time to emit, per function 254.557 / 137 = 1.858ms > Query 3, 126 functions JIT-compiled. > Average time to optimize per function 229.128 / 126 = 1.181ms > Average time to emit per function 167.338 / 126 = 1.328ms Yeah, in combination with your other measurement, it sure does look like there's something worse-than-linear going on here. The alternative is to assume that the individual functions are more complex in one query than the other, and that seems like a bit of a stretch. You could probably generate some queries with lots and lots of expressions to characterize this better. If it is O(N^2), it should not be hard to drive the cost up to the point where the guilty bit of code would stand out in a perf trace. regards, tom lane
Re: JIT doing duplicative optimization?
On Sun, Nov 14, 2021 at 9:07 AM Alvaro Herrera wrote: > On 2021-Nov-11, Alvaro Herrera wrote: > > > But what really surprised me is that the the average time to optimize > > per function is now 2.06ms ... less than half of the previous > > measurement. It emits 10% less functions than before, but the time to > > both optimize and emit is reduced by 50%. How does that make sense? > > Ah, here's a query that illustrates what I'm on about. I found this > query[1] in a blog post[2]. > > ``` > WITH RECURSIVE typeinfo_tree( > oid, ns, name, kind, basetype, elemtype, elemdelim, > range_subtype, attrtypoids, attrnames, depth) > AS ( > SELECT > ti.oid, ti.ns, ti.name, ti.kind, ti.basetype, > ti.elemtype, ti.elemdelim, ti.range_subtype, > ti.attrtypoids, ti.attrnames, 0 > FROM > ( > SELECT > t.oid AS oid, > ns.nspname AS ns, > t.typname AS name, > t.typtype AS kind, > (CASE WHEN t.typtype = 'd' THEN > (WITH RECURSIVE typebases(oid, depth) AS ( > SELECT > t2.typbasetype AS oid, > 0 AS depth > FROM > pg_type t2 > WHERE > t2.oid = t.oid > UNION ALL > SELECT > t2.typbasetype AS oid, > tb.depth + 1AS depth > FROM > pg_type t2, > typebases tb > WHERE >tb.oid = t2.oid >AND t2.typbasetype != 0 >) SELECT oid FROM typebases ORDER BY depth DESC LIMIT 1) >ELSE NULL > END)AS basetype, > t.typelem AS elemtype, > elem_t.typdelim AS elemdelim, > range_t.rngsubtype AS range_subtype, > (CASE WHEN t.typtype = 'c' THEN > (SELECT > array_agg(ia.atttypid ORDER BY ia.attnum) > FROM > pg_attribute ia > INNER JOIN pg_class c > ON (ia.attrelid = c.oid) > WHERE > ia.attnum > 0 AND NOT ia.attisdropped > AND c.reltype = t.oid) > ELSE NULL > END)AS attrtypoids, > (CASE WHEN t.typtype = 'c' THEN > (SELECT > array_agg(ia.attname::text ORDER BY ia.attnum) > FROM > pg_attribute ia > INNER JOIN pg_class c > ON (ia.attrelid = c.oid) > WHERE > ia.attnum > 0 AND NOT ia.attisdropped > AND c.reltype = t.oid) > ELSE NULL > END)AS attrnames > FROM > pg_catalog.pg_type AS t > INNER JOIN pg_catalog.pg_namespace ns ON ( > ns.oid = t.typnamespace) > LEFT JOIN pg_type elem_t ON ( > t.typlen = -1 AND > t.typelem != 0 AND > t.typelem = elem_t.oid > ) > LEFT JOIN pg_range range_t ON ( > t.oid = range_t.rngtypid > ) > ) AS ti > WHERE > ti.oid = any(ARRAY[16,17]::oid[]) > > UNION ALL > > SELECT > ti.oid, ti.ns, ti.name, ti.kind, ti.basetype, > ti.elemtype, ti.elemdelim, ti.range_subtype, > ti.attrtypoids, ti.attrnames, tt.depth + 1 > FROM > ( > SELECT > t.oid AS oid, > ns.nspname AS ns, > t.typname AS name, > t.typtype AS kind, > (CASE WHEN t.typtype = 'd' THEN > (WITH RECURSIVE typebases(oid, depth) AS ( > SELECT > t2.typbasetype AS oid, > 0 AS depth > FROM > pg_type t2 > WHERE > t2.oid = t.oid > UNION ALL > SELECT > t2.typbasetype AS oid, > tb.depth + 1AS depth > FROM > pg_type t2, > typebases tb > WHERE >tb.oid = t2.oid >AND t2.typbasetype != 0 >) SELECT oid FROM typebases ORDER BY depth DESC LI
Re: JIT doing duplicative optimization?
On 2021-Nov-11, Alvaro Herrera wrote: > But what really surprised me is that the the average time to optimize > per function is now 2.06ms ... less than half of the previous > measurement. It emits 10% less functions than before, but the time to > both optimize and emit is reduced by 50%. How does that make sense? Ah, here's a query that illustrates what I'm on about. I found this query[1] in a blog post[2]. ``` WITH RECURSIVE typeinfo_tree( oid, ns, name, kind, basetype, elemtype, elemdelim, range_subtype, attrtypoids, attrnames, depth) AS ( SELECT ti.oid, ti.ns, ti.name, ti.kind, ti.basetype, ti.elemtype, ti.elemdelim, ti.range_subtype, ti.attrtypoids, ti.attrnames, 0 FROM ( SELECT t.oid AS oid, ns.nspname AS ns, t.typname AS name, t.typtype AS kind, (CASE WHEN t.typtype = 'd' THEN (WITH RECURSIVE typebases(oid, depth) AS ( SELECT t2.typbasetype AS oid, 0 AS depth FROM pg_type t2 WHERE t2.oid = t.oid UNION ALL SELECT t2.typbasetype AS oid, tb.depth + 1AS depth FROM pg_type t2, typebases tb WHERE tb.oid = t2.oid AND t2.typbasetype != 0 ) SELECT oid FROM typebases ORDER BY depth DESC LIMIT 1) ELSE NULL END)AS basetype, t.typelem AS elemtype, elem_t.typdelim AS elemdelim, range_t.rngsubtype AS range_subtype, (CASE WHEN t.typtype = 'c' THEN (SELECT array_agg(ia.atttypid ORDER BY ia.attnum) FROM pg_attribute ia INNER JOIN pg_class c ON (ia.attrelid = c.oid) WHERE ia.attnum > 0 AND NOT ia.attisdropped AND c.reltype = t.oid) ELSE NULL END)AS attrtypoids, (CASE WHEN t.typtype = 'c' THEN (SELECT array_agg(ia.attname::text ORDER BY ia.attnum) FROM pg_attribute ia INNER JOIN pg_class c ON (ia.attrelid = c.oid) WHERE ia.attnum > 0 AND NOT ia.attisdropped AND c.reltype = t.oid) ELSE NULL END)AS attrnames FROM pg_catalog.pg_type AS t INNER JOIN pg_catalog.pg_namespace ns ON ( ns.oid = t.typnamespace) LEFT JOIN pg_type elem_t ON ( t.typlen = -1 AND t.typelem != 0 AND t.typelem = elem_t.oid ) LEFT JOIN pg_range range_t ON ( t.oid = range_t.rngtypid ) ) AS ti WHERE ti.oid = any(ARRAY[16,17]::oid[]) UNION ALL SELECT ti.oid, ti.ns, ti.name, ti.kind, ti.basetype, ti.elemtype, ti.elemdelim, ti.range_subtype, ti.attrtypoids, ti.attrnames, tt.depth + 1 FROM ( SELECT t.oid AS oid, ns.nspname AS ns, t.typname AS name, t.typtype AS kind, (CASE WHEN t.typtype = 'd' THEN (WITH RECURSIVE typebases(oid, depth) AS ( SELECT t2.typbasetype AS oid, 0 AS depth FROM pg_type t2 WHERE t2.oid = t.oid UNION ALL SELECT t2.typbasetype AS oid, tb.depth + 1AS depth FROM pg_type t2, typebases tb WHERE tb.oid = t2.oid AND t2.typbasetype != 0 ) SELECT oid FROM typebases ORDER BY depth DESC LIMIT 1) ELSE NULL END)AS basetype, t.typelem AS elemtype, elem_t.typdelim AS elemdelim, range_t.rngsubtype AS range_subtype, (CASE WHEN t.typtype