On 24.01.2018 10:20, Andres Freund wrote:
Hi,

I've spent the last weeks working on my LLVM compilation patchset. In
the course of that I *heavily* revised it. While still a good bit away
from committable, it's IMO definitely not a prototype anymore.

There's too many small changes, so I'm only going to list the major
things. A good bit of that is new. The actual LLVM IR emissions itself
hasn't changed that drastically.  Since I've not described them in
detail before I'll describe from scratch in a few cases, even if things
haven't fully changed.


== JIT Interface ==

To avoid emitting code in very small increments (increases mmap/mremap
rw vs exec remapping, compile/optimization time), code generation
doesn't happen for every single expression individually, but in batches.

The basic object to emit code via is a jit context created with:
   extern LLVMJitContext *llvm_create_context(bool optimize);
which in case of expression is stored on-demand in the EState. For other
usecases that might not be the right location.

To emit LLVM IR (ie. the portabe code that LLVM then optimizes and
generates native code for), one gets a module from that with:
   extern LLVMModuleRef llvm_mutable_module(LLVMJitContext *context);

to which "arbitrary" numbers of functions can be added. In case of
expression evaluation, we get the module once for every expression, and
emit one function for the expression itself, and one for every
applicable/referenced deform function.

As explained above, we do not want to emit code immediately from within
ExecInitExpr()/ExecReadyExpr(). To facilitate that readying a JITed
expression sets the function to callback, which gets the actual native
function on the first actual call.  That allows to batch together the
generation of all native functions that are defined before the first
expression is evaluated - in a lot of queries that'll be all.

Said callback then calls
   extern void *llvm_get_function(LLVMJitContext *context, const char 
*funcname);
which'll emit code for the "in progress" mutable module if necessary,
and then searches all generated functions for the name. The names are
created via
   extern void *llvm_get_function(LLVMJitContext *context, const char 
*funcname);
currently "evalexpr" and deform" with a generation and counter suffix.

Currently expression which do not have access to an EState, basically
all "parent" less expressions, aren't JIT compiled. That could be
changed, but I so far do not see a huge need.

Hi,

As far as I understand generation of native code is now always done for all supported expressions and individually by each backend. I wonder it will be useful to do more efforts to understand when compilation to native code should be done and when interpretation is better. For example many JIT-able languages like Lua are using traces, i.e. query is first interpreted  and trace is generated. If the same trace is followed more than N times, then native code is generated for it.

In context of DBMS executor it is obvious that only frequently executed or expensive queries have to be compiled. So we can use estimated plan cost and number of query executions as simple criteria for JIT-ing the query. May be compilation of simple queries (with small cost) should be done only for prepared statements...

Another question is whether it is sensible to redundantly do expensive work (llvm compilation) in all backends. This question refers to shared prepared statement cache. But even without such cache, it seems to be possible to use for library name some signature of the compiled expression and allow to share this libraries between backends. So before starting code generation, ExecReadyCompiledExpr can first build signature and check if correspondent library is already present. Also it will be easier to control space used by compiled libraries in this case.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply via email to