On 24.01.2018 10:20, Andres Freund wrote:
Hi,
I've spent the last weeks working on my LLVM compilation patchset. In
the course of that I *heavily* revised it. While still a good bit away
from committable, it's IMO definitely not a prototype anymore.
There's too many small changes, so I'm only going to list the major
things. A good bit of that is new. The actual LLVM IR emissions itself
hasn't changed that drastically. Since I've not described them in
detail before I'll describe from scratch in a few cases, even if things
haven't fully changed.
== JIT Interface ==
To avoid emitting code in very small increments (increases mmap/mremap
rw vs exec remapping, compile/optimization time), code generation
doesn't happen for every single expression individually, but in batches.
The basic object to emit code via is a jit context created with:
extern LLVMJitContext *llvm_create_context(bool optimize);
which in case of expression is stored on-demand in the EState. For other
usecases that might not be the right location.
To emit LLVM IR (ie. the portabe code that LLVM then optimizes and
generates native code for), one gets a module from that with:
extern LLVMModuleRef llvm_mutable_module(LLVMJitContext *context);
to which "arbitrary" numbers of functions can be added. In case of
expression evaluation, we get the module once for every expression, and
emit one function for the expression itself, and one for every
applicable/referenced deform function.
As explained above, we do not want to emit code immediately from within
ExecInitExpr()/ExecReadyExpr(). To facilitate that readying a JITed
expression sets the function to callback, which gets the actual native
function on the first actual call. That allows to batch together the
generation of all native functions that are defined before the first
expression is evaluated - in a lot of queries that'll be all.
Said callback then calls
extern void *llvm_get_function(LLVMJitContext *context, const char
*funcname);
which'll emit code for the "in progress" mutable module if necessary,
and then searches all generated functions for the name. The names are
created via
extern void *llvm_get_function(LLVMJitContext *context, const char
*funcname);
currently "evalexpr" and deform" with a generation and counter suffix.
Currently expression which do not have access to an EState, basically
all "parent" less expressions, aren't JIT compiled. That could be
changed, but I so far do not see a huge need.
Hi,
As far as I understand generation of native code is now always done for
all supported expressions and individually by each backend.
I wonder it will be useful to do more efforts to understand when
compilation to native code should be done and when interpretation is better.
For example many JIT-able languages like Lua are using traces, i.e.
query is first interpreted and trace is generated. If the same trace is
followed more than N times, then native code is generated for it.
In context of DBMS executor it is obvious that only frequently executed
or expensive queries have to be compiled.
So we can use estimated plan cost and number of query executions as
simple criteria for JIT-ing the query.
May be compilation of simple queries (with small cost) should be done
only for prepared statements...
Another question is whether it is sensible to redundantly do expensive
work (llvm compilation) in all backends.
This question refers to shared prepared statement cache. But even
without such cache, it seems to be possible to use for library name some
signature of the compiled expression and allow
to share this libraries between backends. So before starting code
generation, ExecReadyCompiledExpr can first build signature and check if
correspondent library is already present.
Also it will be easier to control space used by compiled libraries in
this case.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company