Re: JIT compiling with LLVM v10.1

2018-02-19 Thread Jesper Pedersen

Hi,

On 02/14/2018 01:17 PM, Andres Freund wrote:

On 2018-02-07 06:54:05 -0800, Andres Freund wrote:

I've pushed v10.0. The big (and pretty painful to make) change is that
now all the LLVM specific code lives in src/backend/jit/llvm, which is
built as a shared library which is loaded on demand.



I thought

https://db.in.tum.de/~leis/papers/adaptiveexecution.pdf?lang=en

was relevant for this thread.

Best regards,
 Jesper



Re: JIT compiling with LLVM v10.1

2018-02-15 Thread Andres Freund
Hi,

On 2018-02-15 11:59:46 +0300, Konstantin Knizhnik wrote:
> It is well known fact that Postgres spends most of the time in sequence scan
> queries for warm data in deforming tuples (17% in case of TPC-H Q1).

I think that the majority of the time therein is not actually
bottlenecked by CPU, but by cache misses.  It might be worthwhile to
repeat your analysis with the last patch of my series applied, and the
#define FASTORDER
uncommented.


> Postgres  tries to optimize access to the tuple by caching fixed size
> offsets to the fields whenever possible and loading attributes on demand.
> It is also well know recommendation to put fixed size, non-null, frequently
> used attributes at the beginning of table's attribute list to make this
> optimization work more efficiently.

FWIW, I think this optimization causes vastly more trouble than it's
worth.


> You can see in the code of heap_deform_tuple shows that first NULL value
> will switch it to "slow" mode:

Note that in most workloads the relevant codepath isn't
heap_deform_tuple but slot_deform_tuple.


> 1. Modern platforms are mostly limited by memory access time, number of
> performed instructions is less critical.

I don't think this is quite the correct result. Especially because a lot
of time is spent accessing memory, having code that the CPU can execute
out-of-order (by speculatively executing forward) is hugely
beneficial.  Some of the benefit of JITing comes from being able to
start deforming the next field while memory fetches for the previous one
are still ongoing (iff dealing with fixed width cols).


> 2. For large number of attributes JIT-ing of deform tuple can improve speed
> up to two time. Which is quite good result from my point of view.

+1

Note the last version has a small deficiency in decoding varlena datums
that I need to fix (varsize_any isn't inlined anymore).

Greetings,

Andres Freund



Re: JIT compiling with LLVM v10.1

2018-02-15 Thread Konstantin Knizhnik



On 14.02.2018 21:17, Andres Freund wrote:

Hi,

On 2018-02-07 06:54:05 -0800, Andres Freund wrote:

I've pushed v10.0. The big (and pretty painful to make) change is that
now all the LLVM specific code lives in src/backend/jit/llvm, which is
built as a shared library which is loaded on demand.

The layout is now as follows:

src/backend/jit/jit.c:
 Part of JITing always linked into the server. Supports loading the
 LLVM using JIT library.

src/backend/jit/llvm/
Infrastructure:
  llvmjit.c:
 General code generation and optimization infrastructure
  llvmjit_error.cpp, llvmjit_wrap.cpp:
 Error / backward compat wrappers
  llvmjit_inline.cpp:
 Cross module inlining support
Code-Gen:
   llvmjit_expr.c
 Expression compilation
   llvmjit_deform.c
 Deform compilation

I've pushed a revised version that hopefully should address Jeff's
wish/need of being able to experiment with this out of core. There's now
a "jit_provider" PGC_POSTMASTER GUC that's by default set to
"llvmjit". llvmjit.so is the .so implementing JIT using LLVM. It fills a
set of callbacks via
extern void _PG_jit_provider_init(JitProviderCallbacks *cb);
which can also be implemented by any other potential provider.

The other two biggest changes are that I've added a README
https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/backend/jit/README;hb=jit
and that I've revised the configure support so it does more error
checks, and moved it into config/llvm.m4.

There's a larger smattering of small changes too.

I'm pretty happy with how the separation of core / shlib looks now. I'm
planning to work on cleaning and then pushing some of the preliminary
patches (fixed tupledesc, grouping) over the next few days.

Greetings,

Andres Freund



I have made  some more experiments with efficiency of JIT-ing of deform 
tuple and I want to share this results (I hope that them will be 
interesting).
It is well known fact that Postgres spends most of the time in sequence 
scan queries for warm data in deforming tuples (17% in case of TPC-H Q1).
Postgres  tries to optimize access to the tuple by caching fixed size 
offsets to the fields whenever possible and loading attributes on demand.
It is also well know recommendation to put fixed size, non-null, 
frequently used attributes at the beginning of table's attribute list to 
make this optimization work more efficiently.
You can see in the code of heap_deform_tuple shows that first NULL value 
will switch it to "slow" mode:


for (attnum = 0; attnum < natts; attnum++)
    {
        Form_pg_attribute thisatt = TupleDescAttr(tupleDesc, attnum);

        if (hasnulls && att_isnull(attnum, bp))
        {
            values[attnum] = (Datum) 0;
            isnull[attnum] = true;
            slow = true;        /* can't use attcacheoff anymore */
            continue;
        }


I tried to investigate importance of this optimization and what is 
actual penalty of "slow" mode.
At the same time I want to understand how JIT help to speed-up tuple 
deforming.


I have populated with data three tables:

create table t1(id integer primary key,c1 integer,c2 integer,c3 
integer,c4 integer,c5 integer,c6 integer,c7 integer,c8 integer,c9 integer);
create table t2(id integer primary key,c1 integer,c2 integer,c3 
integer,c4 integer,c5 integer,c6 integer,c7 integer,c8 integer,c9 integer);
create table t3(id integer primary key,c1 integer not null,c2 integer 
not null,c3 integer not null,c4 integer not null,c5 integer not null,c6 
integer not null,c7 integer not null,c8 integer not null,c9 integer not 
null);
insert into t1 (id,c1,c2,c3,c4,c5,c6,c7,c8) values 
(generate_series(1,1000),0,0,0,0,0,0,0,0);
insert into t2 (id,c2,c3,c4,c5,c6,c7,c8,c9) values 
(generate_series(1,1000),0,0,0,0,0,0,0,0);
insert into t3 (id,c1,c2,c3,c4,c5,c6,c7,c8,c9) values 
(generate_series(1,1000),0,0,0,0,0,0,0,0,0);

vacuum analyze t1;
vacuum analyze t2;
vacuum analyze t3;

t1 contains null in last c9 column, t2 - in first c1 columns and t3 has 
all attributes declared as not-null (and JIT can use this knowledge to 
generate more efficient deforming code).
All data set is hold in memory (shared buffer size is greater than 
database size) and I intentionally switch off parallel execution to make 
results more deterministic.

I run two queries calculating aggregates on one/all not-null fields:

select sum(c8) from t*;
select sum(c2), sum(c3), sum(c4), sum(c5), sum(c6), sum(c7), sum(c8) 
from t*;


As expected 35% time was spent in heap_deform_tuple.
But results (msec) were slightly confusing and unexected:

select sum(c8) from t*;


w/o JIT
with JIT
t1  763
563
t2  772
570
t3
776
592


select sum(c2), sum(c3), sum(c4), sum(c5), sum(c6), sum(c7), sum(c8) 
from t*;



w/o JIT
with JIT
t1  1239742
t2  1233747
t3
1255803


I repeat each query 10 times and take the minimal time ( I think that it 

Re: JIT compiling with LLVM v10.1

2018-02-14 Thread Andres Freund
Hi,

On 2018-02-14 23:32:17 +0100, Pierre Ducroquet wrote:
> Here are the LLVM4 and LLVM3.9 compatibility patches.
> Successfully built, and executed some silly queries with JIT forced to make 
> sure it worked.

Thanks!

I'm going to integrate them into my series in the next few days.

Regards,

Andres



Re: JIT compiling with LLVM v10.1

2018-02-14 Thread Pierre Ducroquet
On Wednesday, February 14, 2018 7:17:10 PM CET Andres Freund wrote:
> Hi,
> 
> On 2018-02-07 06:54:05 -0800, Andres Freund wrote:
> > I've pushed v10.0. The big (and pretty painful to make) change is that
> > now all the LLVM specific code lives in src/backend/jit/llvm, which is
> > built as a shared library which is loaded on demand.
> > 
> > The layout is now as follows:
> > 
> > src/backend/jit/jit.c:
> > Part of JITing always linked into the server. Supports loading the
> > LLVM using JIT library.
> > 
> > src/backend/jit/llvm/
> > 
> > Infrastructure:
> >  llvmjit.c:
> > General code generation and optimization infrastructure
> >  
> >  llvmjit_error.cpp, llvmjit_wrap.cpp:
> > Error / backward compat wrappers
> >  
> >  llvmjit_inline.cpp:
> > Cross module inlining support
> > 
> > Code-Gen:
> >   llvmjit_expr.c
> >   
> > Expression compilation
> >   
> >   llvmjit_deform.c
> >   
> > Deform compilation
> 
> I've pushed a revised version that hopefully should address Jeff's
> wish/need of being able to experiment with this out of core. There's now
> a "jit_provider" PGC_POSTMASTER GUC that's by default set to
> "llvmjit". llvmjit.so is the .so implementing JIT using LLVM. It fills a
> set of callbacks via
> extern void _PG_jit_provider_init(JitProviderCallbacks *cb);
> which can also be implemented by any other potential provider.
> 
> The other two biggest changes are that I've added a README
> https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;
> f=src/backend/jit/README;hb=jit and that I've revised the configure support
> so it does more error
> checks, and moved it into config/llvm.m4.
> 
> There's a larger smattering of small changes too.
> 
> I'm pretty happy with how the separation of core / shlib looks now. I'm
> planning to work on cleaning and then pushing some of the preliminary
> patches (fixed tupledesc, grouping) over the next few days.
> 
> Greetings,
> 
> Andres Freund

Hi

Here are the LLVM4 and LLVM3.9 compatibility patches.
Successfully built, and executed some silly queries with JIT forced to make 
sure it worked.

 Pierre>From c856a5db2f0ba34ba7c230a65f60277ae0e7347f Mon Sep 17 00:00:00 2001
From: Pierre 
Date: Fri, 2 Feb 2018 09:11:55 +0100
Subject: [PATCH 1/8] Add support for LLVM4 in llvmjit.c

Signed-off-by: Pierre Ducroquet 
---
 src/backend/jit/llvm/llvmjit.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index 7a96ece0f7..7557dc9a19 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -222,13 +222,20 @@ llvm_get_function(LLVMJitContext *context, const char *funcname)
 
 		addr = 0;
 		if (LLVMOrcGetSymbolAddressIn(handle->stack, , handle->orc_handle, mangled))
-			elog(ERROR, "failed to lookup symbol");
+			elog(ERROR, "failed to lookup symbol %s", mangled);
 		if (addr)
 			return (void *) addr;
 	}
 
 #else
 
+#if LLVM_VERSION_MAJOR < 5
+	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt0_orc, mangled)))
+		return (void *) addr;
+	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt3_orc, mangled)))
+		return (void *) addr;
+	elog(ERROR, "failed to lookup symbol %s for %s", mangled, funcname);
+#else
 	if (LLVMOrcGetSymbolAddress(llvm_opt0_orc, , mangled))
 		elog(ERROR, "failed to lookup symbol");
 	if (addr)
@@ -237,6 +244,8 @@ llvm_get_function(LLVMJitContext *context, const char *funcname)
 		elog(ERROR, "failed to lookup symbol");
 	if (addr)
 		return (void *) addr;
+#endif // LLVM_VERSION_MAJOR
+
 #endif
 
 	elog(ERROR, "failed to JIT: %s", funcname);
@@ -374,11 +383,18 @@ llvm_compile_module(LLVMJitContext *context)
 	 * faster instruction selection mechanism is used.
 	 */
 	{
-		LLVMSharedModuleRef smod;
 		instr_time tb, ta;
 
 		/* emit the code */
 		INSTR_TIME_SET_CURRENT(ta);
+#if LLVM_VERSION_MAJOR < 5
+		orc_handle = LLVMOrcAddEagerlyCompiledIR(compile_orc, context->module,
+			 llvm_resolve_symbol, NULL);
+		// It seems there is no error return from that function in LLVM < 5.
+#else
+		LLVMSharedModuleRef smod;
+
+		LLVMSharedModuleRef smod;
 		smod = LLVMOrcMakeSharedModule(context->module);
 		if (LLVMOrcAddEagerlyCompiledIR(compile_orc, _handle, smod,
 		llvm_resolve_symbol, NULL))
@@ -386,6 +402,7 @@ llvm_compile_module(LLVMJitContext *context)
 			elog(ERROR, "failed to jit module");
 		}
 		LLVMOrcDisposeSharedModuleRef(smod);
+#endif
 		INSTR_TIME_SET_CURRENT(tb);
 		INSTR_TIME_SUBTRACT(tb, ta);
 		ereport(DEBUG1, (errmsg("time to emit: %.3fs",
-- 
2.16.1

>From a44378f05c33a40c485f26e5f007614100c70fe7 Mon Sep 17 00:00:00 2001
From: Pierre 
Date: Fri, 2 Feb 2018 09:13:40 +0100
Subject: [PATCH 2/8] Add LLVM4 support in llvmjit_error.cpp

Signed-off-by: Pierre Ducroquet 
---
 src/backend/jit/llvm/llvmjit_error.cpp | 6 ++
 1 file changed, 6 

Re: JIT compiling with LLVM v10.1

2018-02-14 Thread Andres Freund
Hi,

On 2018-02-07 06:54:05 -0800, Andres Freund wrote:
> I've pushed v10.0. The big (and pretty painful to make) change is that
> now all the LLVM specific code lives in src/backend/jit/llvm, which is
> built as a shared library which is loaded on demand.
> 
> The layout is now as follows:
> 
> src/backend/jit/jit.c:
> Part of JITing always linked into the server. Supports loading the
> LLVM using JIT library.
> 
> src/backend/jit/llvm/
> Infrastructure:
>  llvmjit.c:
> General code generation and optimization infrastructure
>  llvmjit_error.cpp, llvmjit_wrap.cpp:
> Error / backward compat wrappers
>  llvmjit_inline.cpp:
> Cross module inlining support
> Code-Gen:
>   llvmjit_expr.c
> Expression compilation
>   llvmjit_deform.c
> Deform compilation

I've pushed a revised version that hopefully should address Jeff's
wish/need of being able to experiment with this out of core. There's now
a "jit_provider" PGC_POSTMASTER GUC that's by default set to
"llvmjit". llvmjit.so is the .so implementing JIT using LLVM. It fills a
set of callbacks via
extern void _PG_jit_provider_init(JitProviderCallbacks *cb);
which can also be implemented by any other potential provider.

The other two biggest changes are that I've added a README
https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/backend/jit/README;hb=jit
and that I've revised the configure support so it does more error
checks, and moved it into config/llvm.m4.

There's a larger smattering of small changes too.

I'm pretty happy with how the separation of core / shlib looks now. I'm
planning to work on cleaning and then pushing some of the preliminary
patches (fixed tupledesc, grouping) over the next few days.

Greetings,

Andres Freund