Re: TupleTableSlot abstraction

2019-02-26 Thread Andres Freund
On 2019-02-27 15:42:50 +0900, Michael Paquier wrote:
> On Tue, Feb 26, 2019 at 10:38:45PM -0800, Andres Freund wrote:
> > Im not sure I understand. How can adding a memory context + reset to
> > ctas and matview receivers negatively impact other dest receivers?
> 
> I don't think you got my point here: imagine that a plugin use the
> current receiveSlot logic from createas.c or matview.c, and forgets to
> free the tuple copied.  On v11, that works fine.  On current HEAD,
> they win silently a new leak.

The copy was made in intorel_receive()?



Re: TupleTableSlot abstraction

2019-02-26 Thread Michael Paquier
On Tue, Feb 26, 2019 at 10:38:45PM -0800, Andres Freund wrote:
> Im not sure I understand. How can adding a memory context + reset to
> ctas and matview receivers negatively impact other dest receivers?

I don't think you got my point here: imagine that a plugin use the
current receiveSlot logic from createas.c or matview.c, and forgets to
free the tuple copied.  On v11, that works fine.  On current HEAD,
they win silently a new leak.
--
Michael


signature.asc
Description: PGP signature


Re: TupleTableSlot abstraction

2019-02-26 Thread Andres Freund
Hi,

On 2019-02-27 15:34:07 +0900, Michael Paquier wrote:
> On Tue, Feb 26, 2019 at 09:42:38PM -0800, Andres Freund wrote:
> > I'm not so sure that's the architecturally correct fix however. Is it
> > actually guaranteed, given expanded tuples, toasting, etc, that there's
> > no other memory leak here? I wonder if we shouldn't work twoards using a
> > short lived memory context here. Note how e.g. printtup() uses a short
> > lived context for its work.
> 
> Perhaps.  I got to wonder if this change would not impact code using
> their own DestReceiver, resulting in similar leaks when they insert
> tuples on-the-fly.

Im not sure I understand. How can adding a memory context + reset to
ctas and matview receivers negatively impact other dest receivers?


> I was playing a bit with some refactoring of relation creation for
> CTAS in the scope of temporary matviews, and noticed this issue on the
> CF list, so that was a bit annoying, and issues like that tend to be
> easily forgotten..

It's been 10 days since the report, nobody pinged, and obviously I'm
working on pluggable storage, so ...

Greetings,

Andres Freund



Re: TupleTableSlot abstraction

2019-02-26 Thread Michael Paquier
On Tue, Feb 26, 2019 at 09:42:38PM -0800, Andres Freund wrote:
> I'm not so sure that's the architecturally correct fix however. Is it
> actually guaranteed, given expanded tuples, toasting, etc, that there's
> no other memory leak here? I wonder if we shouldn't work twoards using a
> short lived memory context here. Note how e.g. printtup() uses a short
> lived context for its work.

Perhaps.  I got to wonder if this change would not impact code using
their own DestReceiver, resulting in similar leaks when they insert
tuples on-the-fly.  Such issues can be surprising for fork an plugin
developers.  I was playing a bit with some refactoring of relation
creation for CTAS in the scope of temporary matviews, and noticed this
issue on the CF list, so that was a bit annoying, and issues like that
tend to be easily forgotten..
--
Michael


signature.asc
Description: PGP signature


Re: TupleTableSlot abstraction

2019-02-26 Thread Andres Freund
Hi,

On 2019-02-27 14:21:52 +0900, Michael Paquier wrote:
> On Sat, Feb 16, 2019 at 05:07:44PM -0500, Jeff Janes wrote:
> > By blind analogy to the changes made to matview.c, I think that createas.c
> > is missing a heap_freetuple, as attached.

First, sorry to have been slow answering. I was whacking around code
further modifying this, and thought I'd just solve the immediate issue
here by committing the followup work that removes getting the tuple out
of the slot entirely. That took longer than planned, so it makes sense
to commit an interim fix.


> I think that you are right.  CTAS and materialized views share a lot
> when it comes to relation creation and initial table loading.  I have
> reproduced the leak and could notice that your fix is correct.  So
> committed.

I'm not so sure that's the architecturally correct fix however. Is it
actually guaranteed, given expanded tuples, toasting, etc, that there's
no other memory leak here? I wonder if we shouldn't work twoards using a
short lived memory context here. Note how e.g. printtup() uses a short
lived context for its work.

Greetings,

Andres Freund



Re: TupleTableSlot abstraction

2019-02-26 Thread Michael Paquier
On Sat, Feb 16, 2019 at 05:07:44PM -0500, Jeff Janes wrote:
> By blind analogy to the changes made to matview.c, I think that createas.c
> is missing a heap_freetuple, as attached.

I think that you are right.  CTAS and materialized views share a lot
when it comes to relation creation and initial table loading.  I have
reproduced the leak and could notice that your fix is correct.  So
committed.
--
Michael


signature.asc
Description: PGP signature


Re: TupleTableSlot abstraction

2019-02-16 Thread Jeff Janes
On Fri, Nov 16, 2018 at 7:46 PM Andres Freund  wrote:

> Hi,
>
> On 2018-11-13 15:30:21 -0800, Andres Freund wrote:
> > What I'm now planning to do is to go through the big comment in
> > tuptable.h and update that to the new world.  While I'm not convinced
> > that that that's the best place for it, it needs to be accurate.
> >
> > Furthermore:
> > - More comment polishing
> > - I'll probably split the commits up a bit further (particulary JIT
> >   ignoring virtual tuple slots, inlining the hot path of
> >   slot_getsomeattrs())
> > - serious commit message polishing
>
> I've done all of that now, and pushed it.  Thanks Ashutosh, Amit
> Khandekar and everyone else.
>

Hi Andres and all,

This commit 763f2edd92095b1ca2 "Rejigger materializing and fetching a
HeapTuple from a slot" introduces a memory leak into the ExecutorState
context which is invoked by this statement:

CREATE TABLE tbloom AS
   SELECT
 (random() * 100)::int as i1,
 (random() * 100)::int as i2,
 (random() * 100)::int as i3,
 (random() * 100)::int as i4,
 (random() * 100)::int as i5,
 (random() * 100)::int as i6
   FROM
  generate_series(1,5000);

By blind analogy to the changes made to matview.c, I think that createas.c
is missing a heap_freetuple, as attached.

It fixes the leak, and still passes "make check".
Cheers,

Jeff


createas_leak_fix.patch
Description: Binary data


Re: TupleTableSlot abstraction

2018-11-19 Thread Amit Khandekar
On Wed, 14 Nov 2018 at 05:00, Andres Freund  wrote:
> After this, I hope Amit Khandekar will rebase a patch he's sent me
> internally that converts triggers to use slots. I'll work on rebasing
> the pluggable storage patch ontop of this.

Shared this patch in a separate mail thread :
https://www.postgresql.org/message-id/CAJ3gD9fjpoPHSHB-Ufj7ciT8nV0JSA2gdJUdtxo-bMyPrpjk%3DQ%40mail.gmail.com
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



Re: TupleTableSlot abstraction

2018-11-16 Thread Andres Freund
Hi,

On 2018-11-13 15:30:21 -0800, Andres Freund wrote:
> What I'm now planning to do is to go through the big comment in
> tuptable.h and update that to the new world.  While I'm not convinced
> that that that's the best place for it, it needs to be accurate.
> 
> Furthermore:
> - More comment polishing
> - I'll probably split the commits up a bit further (particulary JIT
>   ignoring virtual tuple slots, inlining the hot path of
>   slot_getsomeattrs())
> - serious commit message polishing

I've done all of that now, and pushed it.  Thanks Ashutosh, Amit
Khandekar and everyone else.

On to pluggable storage...

Regards,

Andres



Re: TupleTableSlot abstraction

2018-10-16 Thread Andres Freund
Hi,

On 2018-10-15 12:12:03 +0530, Amit Khandekar wrote:
> On Sat, 13 Oct 2018 at 04:02, Andres Freund  wrote:
> > I'd just have ExecInterpExpr() continue calling ExecEvalSysVar. There's
> > no reason for it to be inline.

> Can you explain more why you think there should be a ExecEvalSysVar()
> definition ? As I mentioned earlier it would just call
> slot_getsysattr() and do nothing else.

I think it's superflous to have three opcodes for this, and we should
instead just have one sysvar instruction that chooses the slot based on
a parameter of the opcode.  Inline code in the interpreter isn't free.


> > And it's simpler for JIT than the alternative.
> 
> You mean it would be simpler for JIT to call ExecEvalSysVar() than
> slot_getsysattr() ? I didn't get why it is simpler.

Because there'd be no special code at all. We can just use the code that
existing ExecEval* routines use.


> Or are you talking considering build_ExecEvalSysVar() ? I am ok with
> retaining build_ExecEvalSysVar() , but I was saying even inside this
> function, we could do :
> LLVMBuildCall( , llvm_get_decl(mod, FuncSlotGetsysattr) , .)
> rather than:
> LLVMFunctionType(,...)
> LLVMAddFunction("ExecEvalSysVar", )
> LLVMBuildCall(...)

That'd probably generate more work than it'd save, because llvm_get_decl
requires that the function is present in llvmjit_types.c.


> > > I went ahead and did these changes, but for now, I haven't replaced
> > > ExecFetchSlotTuple() with ExecFetchSlotHeapTuple(). Instead, I
> > > retained ExecFetchSlotTuple() to be called for heap tuples, and added
> > > a new ExecFetchGenericSlotTuple() to be used with shouldFree. I don't
> > > like these names, but until we have concluded, I don't want to go
> > > ahead and replace all the numerous ExecFetchSlotTuple() calls with
> > > ExecFetchSlotHeapTuple().
> >
> > Why not?
> 
> I haven't gone ahead because I wanted to know if you are ok with the names.

Just renaming a function is cheap, it's just a oneliner script ;)

FWIW, I dislike ExecFetchGenericSlotTuple() as well. It's still a
HeapTuple, there's absolutely nothing Generic about it.  I don't see why
we'd not just use ExecFetchSlotHeapTuple() with a new shouldFree
parameter?

Greetings,

Andres Freund



Re: TupleTableSlot abstraction

2018-10-16 Thread Amit Khandekar
On Tue, 9 Oct 2018 at 20:46, Amit Khandekar  wrote:
> There is still one more regression test failure in polygon.sql which I
> am yet to analyze.

Below is a narrowed down testcase which reproduces the failure in polygon.sql :

create table tab (n int, dist int, id integer);
insert into tab values (1, 2, 3);

-- Force a hash join
set enable_nestloop TO f;
set enable_mergejoin TO f;

-- Expected to return a row
SELECT * FROM tab t1 , tab t2 where t1.id = t2.id;
 n | dist | id | n | dist | id
---+--++---+--+
(0 rows)

In MultiExecPrivateHash(), to generate the hash table, the tuples are
retrieved one by one from the scan of outer plan state. For each
tuple, ExecHashGetHashValue() is called to get the join attribute
value of the tuple. Here, when the attribute is retrieved by a
jit-compiled slot-deforming function built by slot_compile_deform(),
the attribute value is a junk value. So the hash join condition fails
and the join returns no rows.

Root cause :

In llvm_compile_expr(), for the switch case : EEOP_INNER_FETCHSOME,
innerPlanState(parent)->ps_ResultTupleSlot->tts_cb is passed to
slot_compile_deform(). And in slot_compile_deform(), this tts_cb is
used to determine whether the inner slot is a minimal slot or a
buffer/heap tuple slot, and accordingly v_tupleheaderp is calculated
using StructMinimalTupleTableSlot or StructHeapTupleTableSlot.

In the above hash join scenario, the ps_ResultTupleSlot is a minimal tuple slot.
But at runtime, when MultiExecPrivateHash()=>ExecHashGetHashValue() is
called, the slot returned by outer node (Seqscan) is a buffer heap
tuple slot; this is because the seq scan does not return using its
ps_ResultTupleSlot, instead it directly returns its scan slot since
there is no projection info needed. Hence the tuple is retrieved using
a wrong offset inside the Tuple table slot, because the jit function
was compiled assuming it's going to be a minimal tuple slot.

So, although we can safely use
innerPlanState(parent)->ps_ResultTupleSlot to get the tuple descriptor
for slot_compile_deform(), we should not use the same tuple slot to
know what kind of a tuple slot it will be. That can be known only at
runtime.

Possible Fix :

I am thinking, in slot_compile_deform(), we need to include the logic
instructions to determine the slot type. Have a new
FIELDNO_TUPLETABLESLOT_OPS to retrieve TupleTableSlot.tts_cb, and then
accordingly calculate the tuple offset. I am not sure if this will
turn out to have a performance impact on jit execution, or whether it
is feasible to do such conditional thing in llvm; trying to
understand.

Comments ?


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



Re: TupleTableSlot abstraction

2018-10-15 Thread Amit Khandekar
On Sat, 13 Oct 2018 at 04:02, Andres Freund  wrote:
> > > > @@ -2024,7 +2024,18 @@ FormIndexDatum(IndexInfo *indexInfo,
> > > >   Datum   iDatum;
> > > >   boolisNull;
> > > >
> > > > - if (keycol != 0)
> > > > + if (keycol < 0)
> > > > + {
> > > > + HeapTupleTableSlot *hslot = (HeapTupleTableSlot 
> > > > *)slot;
> > > > +
> > > > + /* Only heap tuples have system attributes. */
> > > > + Assert(TTS_IS_HEAPTUPLE(slot) || 
> > > > TTS_IS_BUFFERTUPLE(slot));
> > > > +
> > > > + iDatum = heap_getsysattr(hslot->tuple, keycol,
> > > > +  
> > > > slot->tts_tupleDescriptor,
> > > > +  
> > > > );
> > > > + }
> > > > + else if (keycol != 0)
> > > >   {
> > > >   /*
> > > >* Plain index column; get the value we need 
> > > > directly from the
> > >
> > > This now should access the system column via the slot, right?  There's
> > > other places like this IIRC.
> >
> > Done. In FormIndexDatum() and ExecInterpExpr(), directly calling
> > slot_getsysattr() now.
> >
> > In ExecInterpExpr (), I am no longer using ExecEvalSysVar() now. I am
> > planning to remove this definition since it would be a single line
> > function just calling slot_getsysattr().
> >
> > In build_ExecEvalSysVar(), ExecEvalSysVar() is still used, so I
> > haven't removed the definition yet. I am planning to create a new
> > LLVMValueRef FuncSlotGetsysattr, and use that instead, in
> > build_ExecEvalSysVar(), or for that matter, I am thinking to revert
> > back build_ExecEvalSysVar() and instead have that code inline as in
> > HEAD.
>
> I'd just have ExecInterpExpr() continue calling ExecEvalSysVar. There's
> no reason for it to be inline.
Can you explain more why you think there should be a ExecEvalSysVar()
definition ? As I mentioned earlier it would just call
slot_getsysattr() and do nothing else.

> And it's simpler for JIT than the alternative.

You mean it would be simpler for JIT to call ExecEvalSysVar() than
slot_getsysattr() ? I didn't get why it is simpler.

Or are you talking considering build_ExecEvalSysVar() ? I am ok with
retaining build_ExecEvalSysVar() , but I was saying even inside this
function, we could do :
LLVMBuildCall( , llvm_get_decl(mod, FuncSlotGetsysattr) , .)
rather than:
LLVMFunctionType(,...)
LLVMAddFunction("ExecEvalSysVar", )
LLVMBuildCall(...)


>
> > > > @@ -185,6 +1020,7 @@ ExecResetTupleTable(List *tupleTable,/* tuple 
> > > > table */
> > > >   {
> > > >   TupleTableSlot *slot = lfirst_node(TupleTableSlot, lc);
> > > >
> > > > + slot->tts_cb->release(slot);
> > > >   /* Always release resources and reset the slot to empty */
> > > >   ExecClearTuple(slot);
> > > >   if (slot->tts_tupleDescriptor)
> > > > @@ -240,6 +1076,7 @@ void
> > > >  ExecDropSingleTupleTableSlot(TupleTableSlot *slot)
> > > >  {
> > > >   /* This should match ExecResetTupleTable's processing of one slot 
> > > > */
> > > > + slot->tts_cb->release(slot);
> > > >   Assert(IsA(slot, TupleTableSlot));
> > > >   ExecClearTuple(slot);
> > > >   if (slot->tts_tupleDescriptor)
> > >
> > > ISTM that release should be called *after* clearing the slot.
> >
> > I am copying here what I discussed about this in the earlier reply:
> >
> > I am not sure what was release() designed to do. Currently all of the
> > implementers of this function are empty.
>
> So additional deallocations can happen in a slot. We might need this
> e.g. at some point for zheap which needs larger, longer-lived, buffers
> in slots.
>
> > Was it meant for doing
> > ReleaseTupleDesc(slot->tts_tupleDescriptor) ? Or
> > ReleaseBuffer(bslot->buffer) ?
>
> No. The former lives in generic code, the latter is in ClearTuple.
>
> > I think the purpose of keeping this *before* clearing the tuple might
> > be because the clear() might have already cleared some handles that
> > release() might need.
>
> It's just plainly wrong to call it this way round.

Ok.

>
>
> > I went ahead and did these changes, but for now, I haven't replaced
> > ExecFetchSlotTuple() with ExecFetchSlotHeapTuple(). Instead, I
> > retained ExecFetchSlotTuple() to be called for heap tuples, and added
> > a new ExecFetchGenericSlotTuple() to be used with shouldFree. I don't
> > like these names, but until we have concluded, I don't want to go
> > ahead and replace all the numerous ExecFetchSlotTuple() calls with
> > ExecFetchSlotHeapTuple().
>
> Why not?

I haven't gone ahead because I wanted to know if you are ok with the names.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



Re: TupleTableSlot abstraction

2018-10-12 Thread Andres Freund
On 2018-10-09 20:46:04 +0530, Amit Khandekar wrote:
> On Wed, 26 Sep 2018 at 05:35, Andres Freund  wrote:
> > > +
> > > +/*
> > > + * This is a function used by all getattr() callbacks which deal with a 
> > > heap
> > > + * tuple or some tuple format which can be represented as a heap tuple 
> > > e.g. a
> > > + * minimal tuple.
> > > + *
> > > + * heap_getattr considers any attnum beyond the attributes available in 
> > > the
> > > + * tuple as NULL. This function however returns the values of missing
> > > + * attributes from the tuple descriptor in that case. Also this function 
> > > does
> > > + * not support extracting system attributes.
> > > + *
> > > + * If the attribute needs to be fetched from the tuple, the function 
> > > fills in
> > > + * tts_values and tts_isnull arrays upto the required attnum.
> > > + */
> > > +Datum
> > > +tts_heap_getattr_common(TupleTableSlot *slot, HeapTuple tuple, uint32 
> > > *offp,
> > > + int attnum, bool
> > > *isnull)
> >
> > I'm still *vehemently* opposed to the introduction of this.
> 
> You mean, you want to remove the att_isnull() optimization, right ?

Yes.


> Removed that code now. Directly deforming the tuple regardless of the
> null attribute.

Good, thanks.


> > > @@ -2024,7 +2024,18 @@ FormIndexDatum(IndexInfo *indexInfo,
> > >   Datum   iDatum;
> > >   boolisNull;
> > >
> > > - if (keycol != 0)
> > > + if (keycol < 0)
> > > + {
> > > + HeapTupleTableSlot *hslot = (HeapTupleTableSlot 
> > > *)slot;
> > > +
> > > + /* Only heap tuples have system attributes. */
> > > + Assert(TTS_IS_HEAPTUPLE(slot) || 
> > > TTS_IS_BUFFERTUPLE(slot));
> > > +
> > > + iDatum = heap_getsysattr(hslot->tuple, keycol,
> > > +  
> > > slot->tts_tupleDescriptor,
> > > +  
> > > );
> > > + }
> > > + else if (keycol != 0)
> > >   {
> > >   /*
> > >* Plain index column; get the value we need 
> > > directly from the
> >
> > This now should access the system column via the slot, right?  There's
> > other places like this IIRC.
> 
> Done. In FormIndexDatum() and ExecInterpExpr(), directly calling
> slot_getsysattr() now.
> 
> In ExecInterpExpr (), I am no longer using ExecEvalSysVar() now. I am
> planning to remove this definition since it would be a single line
> function just calling slot_getsysattr().
> 
> In build_ExecEvalSysVar(), ExecEvalSysVar() is still used, so I
> haven't removed the definition yet. I am planning to create a new
> LLVMValueRef FuncSlotGetsysattr, and use that instead, in
> build_ExecEvalSysVar(), or for that matter, I am thinking to revert
> back build_ExecEvalSysVar() and instead have that code inline as in
> HEAD.

I'd just have ExecInterpExpr() continue calling ExecEvalSysVar. There's
no reason for it to be inline. And it's simpler for JIT than the
alternative.

> > > @@ -185,6 +1020,7 @@ ExecResetTupleTable(List *tupleTable,/* tuple 
> > > table */
> > >   {
> > >   TupleTableSlot *slot = lfirst_node(TupleTableSlot, lc);
> > >
> > > + slot->tts_cb->release(slot);
> > >   /* Always release resources and reset the slot to empty */
> > >   ExecClearTuple(slot);
> > >   if (slot->tts_tupleDescriptor)
> > > @@ -240,6 +1076,7 @@ void
> > >  ExecDropSingleTupleTableSlot(TupleTableSlot *slot)
> > >  {
> > >   /* This should match ExecResetTupleTable's processing of one slot */
> > > + slot->tts_cb->release(slot);
> > >   Assert(IsA(slot, TupleTableSlot));
> > >   ExecClearTuple(slot);
> > >   if (slot->tts_tupleDescriptor)
> >
> > ISTM that release should be called *after* clearing the slot.
> 
> I am copying here what I discussed about this in the earlier reply:
> 
> I am not sure what was release() designed to do. Currently all of the
> implementers of this function are empty.

So additional deallocations can happen in a slot. We might need this
e.g. at some point for zheap which needs larger, longer-lived, buffers
in slots.

> Was it meant for doing
> ReleaseTupleDesc(slot->tts_tupleDescriptor) ? Or
> ReleaseBuffer(bslot->buffer) ?

No. The former lives in generic code, the latter is in ClearTuple.

> I think the purpose of keeping this *before* clearing the tuple might
> be because the clear() might have already cleared some handles that
> release() might need.

It's just plainly wrong to call it this way round.


> I went ahead and did these changes, but for now, I haven't replaced
> ExecFetchSlotTuple() with ExecFetchSlotHeapTuple(). Instead, I
> retained ExecFetchSlotTuple() to be called for heap tuples, and added
> a new ExecFetchGenericSlotTuple() 

Re: TupleTableSlot abstraction

2018-10-11 Thread Andres Freund
Hi,

On 2018-10-01 22:21:58 -0400, Tom Lane wrote:
> Kyotaro HORIGUCHI  writes:
> > At Tue, 25 Sep 2018 16:45:09 -0700, Andres Freund  
> > wrote in <20180925234509.3hrrf6tmvy5tf...@alap3.anarazel.de>
> >> On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote:
> >>> Pack the boolean members in TupleTableSlot into a 16 bit tts_flags.
> >>> This reduces the size of TupleTableSlot since each bool member takes
> >>> at least a byte on the platforms where bool is defined as a char.
> 
> > About bitfields, an advantage of it is debugger awareness. We
> > don't need to look aside to the definitions of bitwise macros
> > while using debugger. And the current code is preserved in
> > appearance by using it.
> 
> FWIW, I would expect a change like this to be a net loss performance-wise
> on most platforms.  Testing the contents of a byte-wide variable is pretty
> cheap on any architecture invented later than ~ 1970.  Testing a bit,
> though, requires a masking operation that is not free.  I am not seeing
> how making TupleTableSlot a little smaller buys that back ... we don't
> normally have that many active slots in a plan.

I measured it as a speedup on x86-64, mainly because we require fewer
instructions to reset a slot into an empty state, but also because there
are fewer loads. Masking a register is just about free, loading from
memory isn't, even if the cacheline is in L1.  The other benefit is that
this allows TupleTableSlots to fit into one cacheline more often, and
that's noticable in cache miss rates.

Greetings,

Andres Freund



Re: TupleTableSlot abstraction

2018-10-10 Thread Amit Khandekar
On Wed, 26 Sep 2018 at 05:15, Andres Freund  wrote:
>
> Hi,
>
> On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote:
> > Subject: [PATCH 05/14] Use tts_flags instead of previous bool members
> >
> > Pack the boolean members in TupleTableSlot into a 16 bit tts_flags.
> > This reduces the size of TupleTableSlot since each bool member takes
> > at least a byte on the platforms where bool is defined as a char.
> >
> > Ashutosh Bapat and Andres Freund
>
> > +
> > +/* true = slot is empty */
> > +#define  TTS_ISEMPTY (1 << 1)
> > +#define IS_TTS_EMPTY(slot)   ((slot)->tts_flags & TTS_ISEMPTY)
> > +#define SET_TTS_EMPTY(slot) ((slot)->tts_flags |= TTS_ISEMPTY)
> > +#define RESET_TTS_EMPTY(slot) ((slot)->tts_flags &= ~TTS_ISEMPTY)
>
> The flag stuff is the right way, but I'm a bit dubious about the
> accessor functions.  I can see open-coding the accesses, using the
> macros, or potentially going towards using bitfields.
>
> Any comments?

I like this abstraction using macros, since this will allow us to
conveniently change the way this information is stored inside the
slot. I think for this same reason we have defined macros
(HeapTupleIsHotUpdated, etc) for each bit of heaptuple infomask.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



Re: TupleTableSlot abstraction

2018-10-09 Thread Amit Khandekar
On Wed, 26 Sep 2018 at 05:35, Andres Freund  wrote:
> > +
> > +/*
> > + * This is a function used by all getattr() callbacks which deal with a 
> > heap
> > + * tuple or some tuple format which can be represented as a heap tuple 
> > e.g. a
> > + * minimal tuple.
> > + *
> > + * heap_getattr considers any attnum beyond the attributes available in the
> > + * tuple as NULL. This function however returns the values of missing
> > + * attributes from the tuple descriptor in that case. Also this function 
> > does
> > + * not support extracting system attributes.
> > + *
> > + * If the attribute needs to be fetched from the tuple, the function fills 
> > in
> > + * tts_values and tts_isnull arrays upto the required attnum.
> > + */
> > +Datum
> > +tts_heap_getattr_common(TupleTableSlot *slot, HeapTuple tuple, uint32 
> > *offp,
> > + int attnum, bool
> > *isnull)
>
> I'm still *vehemently* opposed to the introduction of this.

You mean, you want to remove the att_isnull() optimization, right ?
Removed that code now. Directly deforming the tuple regardless of the
null attribute.

>
>
>
> > @@ -2024,7 +2024,18 @@ FormIndexDatum(IndexInfo *indexInfo,
> >   Datum   iDatum;
> >   boolisNull;
> >
> > - if (keycol != 0)
> > + if (keycol < 0)
> > + {
> > + HeapTupleTableSlot *hslot = (HeapTupleTableSlot 
> > *)slot;
> > +
> > + /* Only heap tuples have system attributes. */
> > + Assert(TTS_IS_HEAPTUPLE(slot) || 
> > TTS_IS_BUFFERTUPLE(slot));
> > +
> > + iDatum = heap_getsysattr(hslot->tuple, keycol,
> > +  
> > slot->tts_tupleDescriptor,
> > +  
> > );
> > + }
> > + else if (keycol != 0)
> >   {
> >   /*
> >* Plain index column; get the value we need directly 
> > from the
>
> This now should access the system column via the slot, right?  There's
> other places like this IIRC.

Done. In FormIndexDatum() and ExecInterpExpr(), directly calling
slot_getsysattr() now.

In ExecInterpExpr (), I am no longer using ExecEvalSysVar() now. I am
planning to remove this definition since it would be a single line
function just calling slot_getsysattr().

In build_ExecEvalSysVar(), ExecEvalSysVar() is still used, so I
haven't removed the definition yet. I am planning to create a new
LLVMValueRef FuncSlotGetsysattr, and use that instead, in
build_ExecEvalSysVar(), or for that matter, I am thinking to revert
back build_ExecEvalSysVar() and instead have that code inline as in
HEAD.

>
>
>
> > diff --git a/src/backend/executor/execExprInterp.c 
> > b/src/backend/executor/execExprInterp.c
> > index 9d6e25a..1b4e726 100644
> > --- a/src/backend/executor/execExprInterp.c
> > +++ b/src/backend/executor/execExprInterp.c
> > @@ -490,54 +490,21 @@ ExecInterpExpr(ExprState *state, ExprContext 
> > *econtext, bool *isnull)
> >
> >   EEO_CASE(EEOP_INNER_SYSVAR)
> >   {
> > - int attnum = op->d.var.attnum;
> > - Datum   d;
> > -
> > - /* these asserts must match defenses in slot_getattr 
> > */
> > - Assert(innerslot->tts_tuple != NULL);
> > - Assert(innerslot->tts_tuple != 
> > &(innerslot->tts_minhdr));
> > -
> > - /* heap_getsysattr has sufficient defenses against 
> > bad attnums */
> > - d = heap_getsysattr(innerslot->tts_tuple, attnum,
> > - 
> > innerslot->tts_tupleDescriptor,
> > - op->resnull);
> > - *op->resvalue = d;
> > + ExecEvalSysVar(state, op, econtext, innerslot);
>
> These changes should be in a separate commit.

As mentioned above, now I am not using ExecEvalSysVar(), instead, I am
calling slot_getsysattr(). So no need of separate commit for that.

>
>
> > +const TupleTableSlotOps TTSOpsHeapTuple = {
> > + sizeof(HeapTupleTableSlot),
> > + .init = tts_heap_init,
>
> The first field should also use a named field (same in following cases).

Done.

>
>
> > @@ -185,6 +1020,7 @@ ExecResetTupleTable(List *tupleTable,/* tuple 
> > table */
> >   {
> >   TupleTableSlot *slot = lfirst_node(TupleTableSlot, lc);
> >
> > + slot->tts_cb->release(slot);
> >   /* Always release resources and reset the slot to empty */
> >   ExecClearTuple(slot);
> >   if (slot->tts_tupleDescriptor)
> > @@ -240,6 +1076,7 @@ void
> >  ExecDropSingleTupleTableSlot(TupleTableSlot *slot)
> >  {
> >   /* This should match 

Re: TupleTableSlot abstraction

2018-10-05 Thread Amit Khandekar
On Thu, 4 Oct 2018 at 22:59, Amit Khandekar  wrote:
>
> I have only done the below two changes yet. After doing that and
> rebasing with latest master, in the regression I got crashes, and I
> suspect the reason being that I have used Virtual tuple slot for the
> destination slot of execute_attr_map_slot(). I am analyzing it. I am
> anyway attaching the patches (v12) to give you an idea of how I have
> handled the below two items.

It seems to be some corruption here :

@@ -956,17 +978,39 @@ ExecUpdate(ModifyTableState *mtstate,

- tuple = ExecMaterializeSlot(slot);

+ if (!(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot)))
+ {
+ TupleTableSlot *es_slot = estate->es_dml_input_tuple_slot;
+
+ Assert(es_slot && TTS_IS_HEAPTUPLE(es_slot));
+ if (es_slot->tts_tupleDescriptor != slot->tts_tupleDescriptor)
+ ExecSetSlotDescriptor(es_slot, slot->tts_tupleDescriptor);
+ ExecCopySlot(es_slot, slot);
+ slot = es_slot;
+ }
+
+ tuple = ExecFetchSlotTuple(slot, true);

After the slotification of partition tuple conversion, the input slot
is a virtual tuple, and the above code seems to result in some
corruption which I have not finished analyzing. It only happens for
INSERT ON CONFLICT case with partitions.

On Wed, 26 Sep 2018 at 05:35, Andres Freund  wrote:
>
> > @@ -185,6 +1020,7 @@ ExecResetTupleTable(List *tupleTable,/* tuple 
> > table */
> >   {
> >   TupleTableSlot *slot = lfirst_node(TupleTableSlot, lc);
> >
> > + slot->tts_cb->release(slot);
> >   /* Always release resources and reset the slot to empty */
> >   ExecClearTuple(slot);
> >   if (slot->tts_tupleDescriptor)
> > @@ -240,6 +1076,7 @@ void
> >  ExecDropSingleTupleTableSlot(TupleTableSlot *slot)
> >  {
> >   /* This should match ExecResetTupleTable's processing of one slot */
> > + slot->tts_cb->release(slot);
> >   Assert(IsA(slot, TupleTableSlot));
> >   ExecClearTuple(slot);
> >   if (slot->tts_tupleDescriptor)
>
> ISTM that release should be called *after* clearing the slot.

I am not sure what was release() designed to do. Currently all of the
implementers of this function are empty. Was it meant for doing
ReleaseTupleDesc(slot->tts_tupleDescriptor) ? Or
ReleaseBuffer(bslot->buffer) ? I think the purpose of keeping this
*before* clearing the tuple might be because the clear() might have
already cleared some handles that release() might need.

>
>
> > @@ -56,11 +56,28 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver 
> > *self)
> >   TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self;
> >   HeapTuple   tuple;
> >   shm_mq_result result;
> > + booltuple_copied = false;
> > +
> > + /* Get the tuple out of slot, if necessary converting the slot's 
> > contents
> > +  * into a heap tuple by copying. In the later case we need to free 
> > the copy.
> > +  */
> > + if (TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot))
> > + {
> > + tuple = ExecFetchSlotTuple(slot, true);
> > + tuple_copied = false;
> > + }
> > + else
> > + {
> > + tuple = ExecCopySlotTuple(slot);
> > + tuple_copied = true;
> > + }
>
> This seems like a bad idea to me.  We shouldn't hardcode slots like
> this.  I've previously argued that we should instead allow
> ExecFetchSlotTuple() for all types of tuples, but add a new bool
> *shouldFree argument, which will then allow the caller to free the
> tuple.  We gain zilch by having this kind of logic in multiple callers.

How about having a separate ExecFetchSlotHeapTuple() for many of the
callers where it is known that the tuple is a heap/buffer tuple ? And
in rare places such as above where slot type is not known, we can have
ExecFetchSlotTuple() which would have an extra shouldFree parameter.



Re: TupleTableSlot abstraction

2018-10-04 Thread Amit Khandekar
I have only done the below two changes yet. After doing that and
rebasing with latest master, in the regression I got crashes, and I
suspect the reason being that I have used Virtual tuple slot for the
destination slot of execute_attr_map_slot(). I am analyzing it. I am
anyway attaching the patches (v12) to give you an idea of how I have
handled the below two items.

On Wed, 26 Sep 2018 at 05:09, Andres Freund  wrote:
>
> On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote:
> > The attached v11 tar has the above set of changes.
>
> - I've pushed 0003 - although that commit seems to have included a lot
>   of things unrelated to the commit message. I guess two patches have
>   accidentally been merged?  Could you split out the part of the changes
>   that was mis-squashed?

Yeah, it indeed looks like it had unrelated things, mostly the changes
that moved the slot attribute functions into execTuples.c . I have
included this change as the very first patch in the patch series.


>> From 280eac5c6758061d0a46b7ef9dd324e9a23226b5 Mon Sep 17 00:00:00 2001
>> From: Ashutosh Bapat 
>> Date: Fri, 31 Aug 2018 10:53:42 +0530
>> Subject: [PATCH 08/14] Change tuple table slot creation routines to suite
>>  tuple table slot abstraction
>>
>> This change requires ExecInitResultTupleSlotTL, ExecInitScanTupleSlot,
>> ExecCreateScanSlotFromOuterPlan, ExecInitNullTupleSlot,
>> ExecInitExtraTupleSlot, MakeTupleTableSlot, ExecAllocTableSlot to
>> accept TupleTableSlotType as a new parameter. Change all their calls.
>>
>> Ashutosh Bapat and Andres Freund
>>
>> This by itself won't compile. Neither the tuple table slot abstraction
>> patch would compile and work without this change. Both of those need
>> to be committed together.
>
> I don't like this kind of split - all commits should individually
> compile. I think we should instead introduce dummy / empty structs for
>  etc, and add the parameters necessary to pass them
> through. And then move this patch to *before* the "core" abstract slot
> patch.  That way every commit, but the super verbose stuff is still
> split out.
>

Done. Moved this patch before the core one. In this patch, just
declared the global variables of type TupleTableSlotOps, without
initializing them.

I have tried to make sure individual patches compile successfully by
shuffling around the changes to the right patch, but there is one
particular patch that still gives error. Will fix that later.

I will handle the other review comments in the next patch series.


pg_abstract_tts_patches_v12.tar.gz
Description: GNU Zip compressed data


Re: TupleTableSlot abstraction

2018-10-01 Thread Tom Lane
Kyotaro HORIGUCHI  writes:
> At Tue, 25 Sep 2018 16:45:09 -0700, Andres Freund  wrote 
> in <20180925234509.3hrrf6tmvy5tf...@alap3.anarazel.de>
>> On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote:
>>> Pack the boolean members in TupleTableSlot into a 16 bit tts_flags.
>>> This reduces the size of TupleTableSlot since each bool member takes
>>> at least a byte on the platforms where bool is defined as a char.

> About bitfields, an advantage of it is debugger awareness. We
> don't need to look aside to the definitions of bitwise macros
> while using debugger. And the current code is preserved in
> appearance by using it.

FWIW, I would expect a change like this to be a net loss performance-wise
on most platforms.  Testing the contents of a byte-wide variable is pretty
cheap on any architecture invented later than ~ 1970.  Testing a bit,
though, requires a masking operation that is not free.  I am not seeing
how making TupleTableSlot a little smaller buys that back ... we don't
normally have that many active slots in a plan.

regards, tom lane



Re: TupleTableSlot abstraction

2018-10-01 Thread Kyotaro HORIGUCHI
At Tue, 25 Sep 2018 16:45:09 -0700, Andres Freund  wrote in 
<20180925234509.3hrrf6tmvy5tf...@alap3.anarazel.de>
> Hi,
> 
> On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote:
> > Subject: [PATCH 05/14] Use tts_flags instead of previous bool members
> > 
> > Pack the boolean members in TupleTableSlot into a 16 bit tts_flags.
> > This reduces the size of TupleTableSlot since each bool member takes
> > at least a byte on the platforms where bool is defined as a char.
> > 
> > Ashutosh Bapat and Andres Freund
> 
> > +
> > +/* true = slot is empty */
> > +#defineTTS_ISEMPTY (1 << 1)
> > +#define IS_TTS_EMPTY(slot) ((slot)->tts_flags & TTS_ISEMPTY)
> > +#define SET_TTS_EMPTY(slot) ((slot)->tts_flags |= TTS_ISEMPTY)
> > +#define RESET_TTS_EMPTY(slot) ((slot)->tts_flags &= ~TTS_ISEMPTY)
> 
> The flag stuff is the right way, but I'm a bit dubious about the
> accessor functions.  I can see open-coding the accesses, using the
> macros, or potentially going towards using bitfields.
> 
> Any comments?

Currently we have few setter/resetter function(macro)s for such
simple operations. FWIW open-coding in the following two looks
rather easier to me.

> if (IS_TTS_EMPTY(slot))
> if (slot->tts_flags & TTS_ISEMPTY)


About bitfields, an advantage of it is debugger awareness. We
don't need to look aside to the definitions of bitwise macros
while using debugger. And the current code is preserved in
appearance by using it.

> if (slot->tts_isempty)
> slot->tts_isempty = true;

In short, +1 from me to use bitfields. Coulnd't we use bitfield
here, possiblly in other places then?


=
Not related to tuple slots, in other places, like infomask, we
handle a set of bitmap values altogether.


> infomask = tuple->t_data->t_infomask;

Bare bitfields are a bit inconvenient for the use. It gets
simpler using C11 anonymous member but not so bothersome even in
C99.  Anyway I don't think we jump into that immediately.

infomask.all = tuple->t_data->t_infomask.all;

- if (!HeapTupleHeaderXminCommitted(tuple))
C99> if (tuple->t_infomask.b.xmin_committed)
C11> if (tuple->t_infomask.xmin_committed)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: TupleTableSlot abstraction

2018-09-25 Thread Andres Freund
On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote:
> The attached v11 tar has the above set of changes.


> Subject: [PATCH 07/14] Restructure TupleTableSlot to allow tuples other than
>  HeapTuple

> +
> +/*
> + * This is a function used by all getattr() callbacks which deal with a heap
> + * tuple or some tuple format which can be represented as a heap tuple e.g. a
> + * minimal tuple.
> + *
> + * heap_getattr considers any attnum beyond the attributes available in the
> + * tuple as NULL. This function however returns the values of missing
> + * attributes from the tuple descriptor in that case. Also this function does
> + * not support extracting system attributes.
> + *
> + * If the attribute needs to be fetched from the tuple, the function fills in
> + * tts_values and tts_isnull arrays upto the required attnum.
> + */
> +Datum
> +tts_heap_getattr_common(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp,
> + int attnum, bool
> *isnull)

I'm still *vehemently* opposed to the introduction of this.



> @@ -2024,7 +2024,18 @@ FormIndexDatum(IndexInfo *indexInfo,
>   Datum   iDatum;
>   boolisNull;
>  
> - if (keycol != 0)
> + if (keycol < 0)
> + {
> + HeapTupleTableSlot *hslot = (HeapTupleTableSlot *)slot;
> +
> + /* Only heap tuples have system attributes. */
> + Assert(TTS_IS_HEAPTUPLE(slot) || 
> TTS_IS_BUFFERTUPLE(slot));
> +
> + iDatum = heap_getsysattr(hslot->tuple, keycol,
> +  
> slot->tts_tupleDescriptor,
> +  
> );
> + }
> + else if (keycol != 0)
>   {
>   /*
>* Plain index column; get the value we need directly 
> from the

This now should access the system column via the slot, right?  There's
other places like this IIRC.



> diff --git a/src/backend/executor/execExprInterp.c 
> b/src/backend/executor/execExprInterp.c
> index 9d6e25a..1b4e726 100644
> --- a/src/backend/executor/execExprInterp.c
> +++ b/src/backend/executor/execExprInterp.c
> @@ -490,54 +490,21 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, 
> bool *isnull)
>  
>   EEO_CASE(EEOP_INNER_SYSVAR)
>   {
> - int attnum = op->d.var.attnum;
> - Datum   d;
> -
> - /* these asserts must match defenses in slot_getattr */
> - Assert(innerslot->tts_tuple != NULL);
> - Assert(innerslot->tts_tuple != 
> &(innerslot->tts_minhdr));
> -
> - /* heap_getsysattr has sufficient defenses against bad 
> attnums */
> - d = heap_getsysattr(innerslot->tts_tuple, attnum,
> - 
> innerslot->tts_tupleDescriptor,
> - op->resnull);
> - *op->resvalue = d;
> + ExecEvalSysVar(state, op, econtext, innerslot);

These changes should be in a separate commit.


> +const TupleTableSlotOps TTSOpsHeapTuple = {
> + sizeof(HeapTupleTableSlot),
> + .init = tts_heap_init,

The first field should also use a named field (same in following cases).


> @@ -185,6 +1020,7 @@ ExecResetTupleTable(List *tupleTable,/* tuple table 
> */
>   {
>   TupleTableSlot *slot = lfirst_node(TupleTableSlot, lc);
>  
> + slot->tts_cb->release(slot);
>   /* Always release resources and reset the slot to empty */
>   ExecClearTuple(slot);
>   if (slot->tts_tupleDescriptor)
> @@ -240,6 +1076,7 @@ void
>  ExecDropSingleTupleTableSlot(TupleTableSlot *slot)
>  {
>   /* This should match ExecResetTupleTable's processing of one slot */
> + slot->tts_cb->release(slot);
>   Assert(IsA(slot, TupleTableSlot));
>   ExecClearTuple(slot);
>   if (slot->tts_tupleDescriptor)

ISTM that release should be called *after* clearing the slot.


> @@ -56,11 +56,28 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver 
> *self)
>   TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self;
>   HeapTuple   tuple;
>   shm_mq_result result;
> + booltuple_copied = false;
> +
> + /* Get the tuple out of slot, if necessary converting the slot's 
> contents
> +  * into a heap tuple by copying. In the later case we need to free the 
> copy.
> +  */
> + if (TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot))
> + {
> + tuple = ExecFetchSlotTuple(slot, true);
> + tuple_copied = false;
> + }
> + else
> + {
> + tuple = ExecCopySlotTuple(slot);
> + 

Re: TupleTableSlot abstraction

2018-09-25 Thread Andres Freund
Hi,

On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote:
> Subject: [PATCH 05/14] Use tts_flags instead of previous bool members
> 
> Pack the boolean members in TupleTableSlot into a 16 bit tts_flags.
> This reduces the size of TupleTableSlot since each bool member takes
> at least a byte on the platforms where bool is defined as a char.
> 
> Ashutosh Bapat and Andres Freund

> +
> +/* true = slot is empty */
> +#define  TTS_ISEMPTY (1 << 1)
> +#define IS_TTS_EMPTY(slot)   ((slot)->tts_flags & TTS_ISEMPTY)
> +#define SET_TTS_EMPTY(slot) ((slot)->tts_flags |= TTS_ISEMPTY)
> +#define RESET_TTS_EMPTY(slot) ((slot)->tts_flags &= ~TTS_ISEMPTY)

The flag stuff is the right way, but I'm a bit dubious about the
accessor functions.  I can see open-coding the accesses, using the
macros, or potentially going towards using bitfields.

Any comments?

- Andres



Re: TupleTableSlot abstraction

2018-09-25 Thread Andres Freund
On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote:
> The attached v11 tar has the above set of changes.

- I've pushed 0003 - although that commit seems to have included a lot
  of things unrelated to the commit message. I guess two patches have
  accidentally been merged?  Could you split out the part of the changes
  that was mis-squashed?
- I've pushed an extended version of 0001.
- I've pushed 0002, after some minor polishing
- I've pushed 0004

Greetings,

Andres Freund



Re: TupleTableSlot abstraction

2018-09-04 Thread Amit Khandekar
On 28 August 2018 at 22:43, Ashutosh Bapat
 wrote:
> On Fri, Aug 24, 2018 at 6:46 AM, Andres Freund  wrote:
>
>>
>>> @@ -2883,7 +2885,7 @@ CopyFrom(CopyState cstate)
>>>   if (slot == NULL)   /* "do nothing" */
>>>   skip_tuple = true;
>>>   else/* trigger might have 
>>> changed tuple */
>>> - tuple = ExecMaterializeSlot(slot);
>>> + tuple = ExecFetchSlotTuple(slot, true);
>>>   }
>>
>> Could we do the Materialize vs Fetch vs Copy change separately?
>>
>
> Ok. I will do that.

Ashutosh offlist has shared with me
0006-Rethink-ExecMaterializeSlot-ExecFetchSlotTuple-in-th.patch which
contains the separated changes. The attached tar includes this
additional patch.


>>
>>> @@ -1590,7 +1590,8 @@ ExecHashTableInsert(HashJoinTable hashtable,
>>>   TupleTableSlot *slot,
>>>   uint32 hashvalue)
>>>  {
>>> - MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot);
>>> + boolshouldFree;
>>> + MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot, );
>>>   int bucketno;
>>>   int batchno;
>>>
>>> @@ -1664,6 +1665,9 @@ ExecHashTableInsert(HashJoinTable hashtable,
>>> hashvalue,
>>> 
>>> >innerBatchFile[batchno]);
>>>   }
>>> +
>>> + if (shouldFree)
>>> + heap_free_minimal_tuple(tuple);
>>>  }
>>
>> Hm, how about splitting these out?
>
> Split into a separate patch? It doesn't make sense to add this patch
> before 0006 since the slots in those patches can "own" a minimal
> tuple. Let's add a patch after 0006 i.e. tuple table abstraction
> patch. Will do.

Ashutosh offlist has shared with me
0009-Rethink-ExecFetchSlotMinimalTuple.patch which contains the above
separated changes. The attached tar includes this additional patch.


On 31 August 2018 at 20:50, Andres Freund  wrote:
>> On 31 August 2018 at 10:05, Amit Khandekar  wrote:
> Hi,
>
> On 2018-08-31 10:05:05 +0530, Amit Khandekar wrote:
>> On 28 August 2018 at 22:43, Ashutosh Bapat
>> >> I think I was wrong at saying that we should remove this. I think you
>> >> were right that it should become a callback...
>>
>> > We have replaced all slot_getsysattrs() with heap_getsysattr(). Do you
>> > want to reinstantiate those as well? If so, slot_getsysattr() becomes
>> > a wrapper around getsysattr() callback.
>
> Right.
>
>> One option is that the getsysattr() callback function returns false if
>> system attributes are not supported for that slot type. Other option
>> is that in the not-supported case, the function errors out, meaning
>> that the caller should be aware that the slot type is the one that
>> supports system attributes.
>
>> I had prepared changes for the first option, but Ashutosh Bapat
>> offlist made me realize that it's worth considering the 2nd option(
>> i.e. erroring out).
>
> I think we should error out.

I did these changes in the existing 0007-Restructure-TupleTableSlot...
patch. This patch now contains changes for the new getsysattr()
callback. I used the 2nd option : erroring out.

On 31 August 2018 at 10:05, Amit Khandekar  wrote:
> The only use case where slot_getsysattr() is called right now is
> execCurrentOf(), and it currently checks the bool return value of
> slot_getsysattr() and prints a user-friendly message if false is
> returned. Looking at the code (commit 8f5ac440430ab), it seems that we
> want to continue to have a user-friendly message for some unhandled
> cases like custom scans. So perhaps for now it's ok to go with the
> first option where getsysattr callback returns false for slot types
> that don't have system attributes.

I noticed that the issue for which commit 8f5ac440430ab was done was
that the tid was attempted to be fetched from a tuple that doesn't
support system attribute (virtual tuple), although the report
complained about a non-user-friendly message being given. So similarly
for custom scan, since it is not handled similarly, for now we can
continue to give an "unfriendly" message. The getsysattr() callbacks
will return something like "virtual tuple table slot does not have
system atttributes" if the slot does not support system attributes.


The attached v11 tar has the above set of changes.

Thanks
-Amit Khandekar


pg_abstract_tts_patches_v11.tar.gz
Description: GNU Zip compressed data


Re: TupleTableSlot abstraction

2018-08-31 Thread Andres Freund
Hi,

On 2018-08-31 10:05:05 +0530, Amit Khandekar wrote:
> On 28 August 2018 at 22:43, Ashutosh Bapat
> >> I think I was wrong at saying that we should remove this. I think you
> >> were right that it should become a callback...
> 
> > We have replaced all slot_getsysattrs() with heap_getsysattr(). Do you
> > want to reinstantiate those as well? If so, slot_getsysattr() becomes
> > a wrapper around getsysattr() callback.

Right.

> One option is that the getsysattr() callback function returns false if
> system attributes are not supported for that slot type. Other option
> is that in the not-supported case, the function errors out, meaning
> that the caller should be aware that the slot type is the one that
> supports system attributes.

> I had prepared changes for the first option, but Ashutosh Bapat
> offlist made me realize that it's worth considering the 2nd option(
> i.e. erroring out).

I think we should error out.


> >> I still think this is an optimization with a negative benefit,
> >> especially as it requires an extra callback. We should just rely on
> >> slot_deform_tuple and then access that. That'll also just access the
> >> null bitmap for the relevant column, and it'll make successive accesses
> >> cheaper.
> >
> > I don't understand why we have differing implementations for
> > slot_attisnull(), slot_getsomeattrs(), slot_getattr() in HEAD. If what
> > you are saying is true, we should have implemented all the first and
> > last as a call to slot_getsomeattrs() followed by returing values from
> > tts_values and tts_isnull. Since this is refactoring work, I am trying
> > not to change the existing functionality of those functions.
> 
> I agree that we should not change the way in which slot_getattr()
> finds the attr; i.e. first call att_isnull(), and only then try to
> deform the tuple. I mean there must be some good reason that is done
> on HEAD. Maybe we can change this separately after investigation, but
> not as part of the tuple abstraction patch.

There's really no good reason for the behaviour as it exists on HEAD. It
already can cause worse performance there. The price to pay for
continuing to have an optimization which isn't actually optimizing
anything is way too high if it requires us to have multiple functionally
unnecessary callbacks.

Greetings,

Andres Freund



Re: TupleTableSlot abstraction

2018-08-30 Thread Amit Khandekar
On 28 August 2018 at 22:43, Ashutosh Bapat
 wrote:
> On Fri, Aug 24, 2018 at 6:46 AM, Andres Freund  wrote:
>>
>>> -/*
>>> - * slot_getsysattr
>>> - *   This function fetches a system attribute of the slot's 
>>> current tuple.
>>> - *   Unlike slot_getattr, if the slot does not contain system 
>>> attributes,
>>> - *   this will return false (with a NULL attribute value) instead 
>>> of
>>> - *   throwing an error.
>>> - */
>>> -bool
>>> -slot_getsysattr(TupleTableSlot *slot, int attnum,
>>> - Datum *value, bool *isnull)
>>> -{
>>> - HeapTuple   tuple = slot->tts_tuple;
>>> -
>>> - Assert(attnum < 0); /* else caller error */
>>> - if (tuple == NULL ||
>>> - tuple == &(slot->tts_minhdr))
>>> - {
>>> - /* No physical tuple, or minimal tuple, so fail */
>>> - *value = (Datum) 0;
>>> - *isnull = true;
>>> - return false;
>>> - }
>>> - *value = heap_getsysattr(tuple, attnum, slot->tts_tupleDescriptor, 
>>> isnull);
>>> - return true;
>>> -}
>>
>> I think I was wrong at saying that we should remove this. I think you
>> were right that it should become a callback...
>
> We have replaced all slot_getsysattrs() with heap_getsysattr(). Do you
> want to reinstantiate those as well? If so, slot_getsysattr() becomes
> a wrapper around getsysattr() callback.

One option is that the getsysattr() callback function returns false if
system attributes are not supported for that slot type. Other option
is that in the not-supported case, the function errors out, meaning
that the caller should be aware that the slot type is the one that
supports system attributes.

I had prepared changes for the first option, but Ashutosh Bapat
offlist made me realize that it's worth considering the 2nd option(
i.e. erroring out).

The only use case where slot_getsysattr() is called right now is
execCurrentOf(), and it currently checks the bool return value of
slot_getsysattr() and prints a user-friendly message if false is
returned. Looking at the code (commit 8f5ac440430ab), it seems that we
want to continue to have a user-friendly message for some unhandled
cases like custom scans. So perhaps for now it's ok to go with the
first option where getsysattr callback returns false for slot types
that don't have system attributes.

>
>>
>>
>>> +/*
>>> + * This is a function used by all getattr() callbacks which deal with a 
>>> heap
>>> + * tuple or some tuple format which can be represented as a heap tuple 
>>> e.g. a
>>> + * minimal tuple.
>>> + *
>>> + * heap_getattr considers any attnum beyond the attributes available in the
>>> + * tuple as NULL. This function however returns the values of missing
>>> + * attributes from the tuple descriptor in that case. Also this function 
>>> does
>>> + * not support extracting system attributes.
>>> + *
>>> + * If the attribute needs to be fetched from the tuple, the function fills 
>>> in
>>> + * tts_values and tts_isnull arrays upto the required attnum.
>>> + */
>>> +Datum
>>> +tts_heap_getattr_common(TupleTableSlot *slot, HeapTuple tuple, uint32 
>>> *offp,
>>> + int attnum, bool *isnull)
>>> +{
>>> + HeapTupleHeader tup = tuple->t_data;
>>> + Assert(slot->tts_nvalid < attnum);
>>> +
>>> + Assert(attnum > 0);
>>> +
>>> + if (attnum > HeapTupleHeaderGetNatts(tup))
>>> + return getmissingattr(slot->tts_tupleDescriptor, attnum, 
>>> isnull);
>>> +
>>> + /*
>>> +  * check if target attribute is null: no point in groveling through 
>>> tuple
>>> +  */
>>> + if (HeapTupleHasNulls(tuple) && att_isnull(attnum - 1, tup->t_bits))
>>> + {
>>> + *isnull = true;
>>> + return (Datum) 0;
>>> + }
>>
>> I still think this is an optimization with a negative benefit,
>> especially as it requires an extra callback. We should just rely on
>> slot_deform_tuple and then access that. That'll also just access the
>> null bitmap for the relevant column, and it'll make successive accesses
>> cheaper.
>
> I don't understand why we have differing implementations for
> slot_attisnull(), slot_getsomeattrs(), slot_getattr() in HEAD. If what
> you are saying is true, we should have implemented all the first and
> last as a call to slot_getsomeattrs() followed by returing values from
> tts_values and tts_isnull. Since this is refactoring work, I am trying
> not to change the existing functionality of those functions.

I agree that we should not change the way in which slot_getattr()
finds the attr; i.e. first call att_isnull(), and only then try to
deform the tuple. I mean there must be some good reason that is done
on HEAD. Maybe we can change this separately after investigation, but
not as part of the tuple abstraction patch.

--

BTW, on HEAD, for dropped attribute slot_getattr() returns null datum,
which hasn't been done in the patch 

Re: TupleTableSlot abstraction

2018-08-30 Thread Alvaro Herrera
Man, how I dislike patches in tarballs.

0002 says:

+ * shouldFree is set 'true' since a tuple stored on a disk page should not be
+ * pfree'd.

Surely you mean 'false' :-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: TupleTableSlot abstraction

2018-08-28 Thread Ashutosh Bapat
verted back all those
changes. I think we should change the JIT code automatically when
function signatures/structure definitions change OR at least
compilation should fail indicating the difference between JIT code and
normal code. Investigating a crash takes a hell lot of time and is not
easy without a custom LLVM build. I doubt if every PostgreSQL
developer would want to do that.

>
>>   /*
>> - * Fill in missing values for a TupleTableSlot.
>> - *
>> - * This is only exposed because it's needed for JIT compiled tuple
>> - * deforming. That exception aside, there should be no callers outside of 
>> this
>> - * file.
>> - */
>> -void
>> -slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum)
>> -{
>> - AttrMissing *attrmiss = NULL;
>> - int missattnum;
>> -
>> - if (slot->tts_tupleDescriptor->constr)
>> - attrmiss = slot->tts_tupleDescriptor->constr->missing;
>> -
>> - if (!attrmiss)
>> - {
>> - /* no missing values array at all, so just fill everything in 
>> as NULL */
>> - memset(slot->tts_values + startAttNum, 0,
>> -(lastAttNum - startAttNum) * sizeof(Datum));
>> - memset(slot->tts_isnull + startAttNum, 1,
>> -(lastAttNum - startAttNum) * sizeof(bool));
>> - }
>> - else
>> - {
>> - /* if there is a missing values array we must process them one 
>> by one */
>> - for (missattnum = startAttNum;
>> -  missattnum < lastAttNum;
>> -  missattnum++)
>> - {
>> - slot->tts_values[missattnum] = 
>> attrmiss[missattnum].am_value;
>> - slot->tts_isnull[missattnum] = 
>> !attrmiss[missattnum].am_present;
>> -     }
>> - }
>> -}
>
> I would split out these moves into a separate commit, they are trivially
> committable separately. The commit's pretty big already, and that'd make
> it easier to see the actual differences.

I have included some of those movements in the same patch which
changes the type of tts_nvalid. Functions like slot_attisnull() are
changed to be inline static functions in tuptable.h. Those functions
are now inline since they are just wrapper around slot specific
callbacks. Thus without the TupleTableSlot abstraction patch it's not
possible to mark them "inline and thus move them to a header file e.g.
tuptable.h. For now I have left these kinds of movements in
TupleTableSlot abstraction patch.

>
>
>> -
>> -/*
>>   * heap_compute_data_size
>>   *   Determine size of the data area of a tuple to be constructed
>>   */
>> @@ -1407,10 +1370,9 @@ heap_deform_tuple(HeapTuple tuple, TupleDesc 
>> tupleDesc,
>>   *   re-computing information about previously extracted attributes.
>>   *   slot->tts_nvalid is the number of attributes already extracted.
>>   */
>> -static void
>> -slot_deform_tuple(TupleTableSlot *slot, int natts)
>> +void
>> +slot_deform_tuple(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp, int 
>> natts)
>>  {
>
> This should be renamed to include "heap" in the name, as it's not going
> to be usable for, say, zheap.

LGTM. Done.

>
>
>> -/*
>> - * slot_getsysattr
>> - *   This function fetches a system attribute of the slot's current 
>> tuple.
>> - *   Unlike slot_getattr, if the slot does not contain system 
>> attributes,
>> - *   this will return false (with a NULL attribute value) instead of
>> - *   throwing an error.
>> - */
>> -bool
>> -slot_getsysattr(TupleTableSlot *slot, int attnum,
>> - Datum *value, bool *isnull)
>> -{
>> - HeapTuple   tuple = slot->tts_tuple;
>> -
>> - Assert(attnum < 0); /* else caller error */
>> - if (tuple == NULL ||
>> - tuple == &(slot->tts_minhdr))
>> - {
>> - /* No physical tuple, or minimal tuple, so fail */
>> - *value = (Datum) 0;
>> - *isnull = true;
>> - return false;
>> - }
>> - *value = heap_getsysattr(tuple, attnum, slot->tts_tupleDescriptor, 
>> isnull);
>> - return true;
>> -}
>
> I think I was wrong at saying that we should remove this. I think you
> were right that it should become a callback...

We have replaced all slot_getsysattrs(

Re: TupleTableSlot abstraction

2018-08-23 Thread Andres Freund
Hi,

On 2018-08-20 19:51:33 +0530, Ashutosh Bapat wrote:
> Sorry, forgot about that. Here's the patch set with that addressed.

Btw, you attach files as tar.zip, but they're actually gzip
compressed...


> From 838a463646a048b3dccff95079a514fdc86effb3 Mon Sep 17 00:00:00 2001
> From: Ashutosh Bapat 
> Date: Mon, 13 Aug 2018 11:27:57 +0530
> Subject: [PATCH 01/11] Split ExecStoreTuple into ExecStoreHeapTuple and
>  ExecStoreBufferHeapTuple
> 
> ExecStoreTuple() accepts a heap tuple from a buffer or constructed
> on-the-fly. In the first case the caller passed a valid buffer and in
> the later case it passes InvalidBuffer. In the first case,
> ExecStoreTuple() pins the given buffer and in the later case it
> records shouldFree flag. The function has some extra checks to
> differentiate between the two cases.  The usecases never overlap thus
> spending extra cycles in checks is useless. Hence separate these
> usecases into separate functions ExecStoreHeapTuple() to store
> on-the-fly tuple and ExecStoreBufferHeapTuple() to store an on-disk
> tuple from a buffer. This allows to shave some extra cycles while
> storing a tuple in the slot.

It doesn't *yet* allow shaving extra cycles, no?

> *  SLOT ACCESSORS
>   *   ExecSetSlotDescriptor   - set a slot's tuple descriptor
> - *   ExecStoreTuple  - store a physical tuple in the 
> slot
> + *   ExecStoreHeapTuple  - store an on-the-fly heap 
> tuple in the slot
> + *   ExecStoreBufferHeapTuple - store an on-disk heap tuple in the 
> slot
>   *   ExecStoreMinimalTuple   - store a minimal physical tuple in the 
> slot
>   *   ExecClearTuple  - clear contents of a slot
>   *   ExecStoreVirtualTuple   - mark slot as containing a virtual
>   *   tuple

I'd advocate for a separate patch ripping these out, they're almost
always out of date.


>  /* 
> - *   ExecStoreTuple
> + *   ExecStoreHeapTuple
>   *
> - *   This function is used to store a physical tuple into a specified
> + *   This function is used to store an on-the-fly physical tuple 
> into a specified
>   *   slot in the tuple table.
>   *
>   *   tuple:  tuple to store
>   *   slot:   slot to store it in
> - *   buffer: disk buffer if tuple is in a disk page, else 
> InvalidBuffer
>   *   shouldFree: true if ExecClearTuple should pfree() the tuple
>   *   when done with it
>   *
> - * If 'buffer' is not InvalidBuffer, the tuple table code acquires a pin
> - * on the buffer which is held until the slot is cleared, so that the tuple
> - * won't go away on us.
> + * shouldFree is normally set 'true' for tuples constructed on-the-fly.  But 
> it
> + * can be 'false' when the referenced tuple is held in a tuple table slot
> + * belonging to a lower-level executor Proc node.  In this case the 
> lower-level
> + * slot retains ownership and responsibility for eventually releasing the
> + * tuple.  When this method is used, we must be certain that the upper-level
> + * Proc node will lose interest in the tuple sooner than the lower-level one
> + * does!  If you're not certain, copy the lower-level tuple with 
> heap_copytuple
> + * and let the upper-level table slot assume ownership of the copy!
> + *
> + * Return value is just the passed-in slot pointer.
> + * 
> + */
> +TupleTableSlot *
> +ExecStoreHeapTuple(HeapTuple tuple,
> +TupleTableSlot *slot,
> +bool shouldFree)
> +{
> + /*
> +  * sanity checks
> +  */
> + Assert(tuple != NULL);
> + Assert(slot != NULL);
> + Assert(slot->tts_tupleDescriptor != NULL);
> +
> + /*
> +  * Free any old physical tuple belonging to the slot.
> +  */
> + if (slot->tts_shouldFree)
> + heap_freetuple(slot->tts_tuple);
> + if (slot->tts_shouldFreeMin)
> + heap_free_minimal_tuple(slot->tts_mintuple);
> +
> + /*
> +  * Store the new tuple into the specified slot.
> +  */
> + slot->tts_isempty = false;
> + slot->tts_shouldFree = shouldFree;
> + slot->tts_shouldFreeMin = false;
> + slot->tts_tuple = tuple;
> + slot->tts_mintuple = NULL;
> +
> + /* Mark extracted state invalid */
> + slot->tts_nvalid = 0;
> +
> + return slot;
> +}

Uh, there could very well be a buffer previously stored in the slot, no?
This can't currently be applied independently afaict.


> From c4c55bc0d501400ffca2e7393039b2d38660cd2d Mon Sep 17 00:00:00 2001
> From: Ashutosh Bapat 
> Date: Mon, 13 Aug 2018 11:27:57 +0530
> Subject: [PATCH 02/11] tts_nvalid in TupleTableSlot is AttrNumber

> @@ -125,7 +125,7 @@ typedef struct TupleTableSlot
>   MemoryContext tts_mcxt; /* slot itself is in this context */
>   Buffer  tts_buffer; /* 

Re: TupleTableSlot abstraction

2018-08-20 Thread Andres Freund
On 2018-08-20 17:51:38 +0530, Ashutosh Bapat wrote:
> > I also noticed an independent issue in your changes to
> > ExecInitScanTupleSlot(): You can't assume that the plan belonging to the
> > ScanState have a Scan node in their plan. Look e.g. at Material, Sort
> > etc. So currently your scanrelid access is often just uninitialized
> > data.
> 
> I have incorporated changes in your patches into the relevant patches
> in the updated patch-set. With this patch-set make check-world passes
> for me.

Have you addressed the issue I commented on above?

Greetings,

Andres Freund



Re: TupleTableSlot abstraction

2018-08-18 Thread Andres Freund
On 2018-08-17 01:07:06 -0700, Andres Freund wrote:
> Hi,
> 
> On 2018-08-17 12:10:20 +0530, Ashutosh Bapat wrote:
> > We need to add LLVM code to fetch tts_flags and
> > perform bit operation on it to get or set slow property. I haven't
> > found any precedence for LLVM bit operations in postgresql's JIT code.
> 
> There are several, look for the infomask accesses in
> slot_compiler_deform.
> 
> I'll try to do the adaption later today.

Attached is a series of patches doing so.  The previous implementation
of sysvar accesses wasn't actually working - the slot argument was
uninitialized.

I also noticed an independent issue in your changes to
ExecInitScanTupleSlot(): You can't assume that the plan belonging to the
ScanState have a Scan node in their plan. Look e.g. at Material, Sort
etc. So currently your scanrelid access is often just uninitialized
data.

Greetings,

Andres Freund
>From 68ab969d94964a8e6cdfac974d73ef559546ffa0 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 18 Aug 2018 08:12:52 -0700
Subject: [PATCH 1/4] Fix slot type used in subqueryscan.

This later becomes relevant because it prevents upper layers from
making assumptions about the format of the slots.
---
 src/backend/executor/nodeSubqueryscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 1e83e673939..6da918894fd 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -130,7 +130,7 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
 	 */
 	ExecInitScanTupleSlot(estate, >ss,
 		  ExecGetResultType(subquerystate->subplan),
-		  );
+		  );
 
 	/*
 	 * Initialize result slot, type and projection.
-- 
2.18.0.rc2.dirty

>From 50ca33b96bf6017f04863486763e2e3250c5cc0a Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 18 Aug 2018 08:13:49 -0700
Subject: [PATCH 2/4] XXX: Copy slot in nodeMaterial.c

This is needed because otherwise upper layers can't make correct
assumptions about the type of slot handed up.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/executor/nodeMaterial.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 66c4ef6ca90..fc36aedfa26 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -146,10 +146,8 @@ ExecMaterial(PlanState *pstate)
 		if (tuplestorestate)
 			tuplestore_puttupleslot(tuplestorestate, outerslot);
 
-		/*
-		 * We can just return the subplan's returned tuple, without copying.
-		 */
-		return outerslot;
+		ExecCopySlot(slot, outerslot);
+		return slot;
 	}
 
 	/*
-- 
2.18.0.rc2.dirty

>From 5d6168601419ff044f197dfe5c1a2479f97496d1 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 18 Aug 2018 08:15:52 -0700
Subject: [PATCH 3/4] Fix JIT calls to ExecEvalSysVar().

The previous attempt at this called the function without initializing
the slot argument. Which unsurprisingly leads to crashes.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/jit/llvm/llvmjit_expr.c | 58 ++---
 1 file changed, 52 insertions(+), 6 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 432f81e0dfb..5c878e45285 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -64,6 +64,9 @@ static void build_EvalXFunc(LLVMBuilderRef b, LLVMModuleRef mod,
 const char *funcname,
 LLVMValueRef v_state, LLVMValueRef v_econtext,
 ExprEvalStep *op);
+static void build_ExecEvalSysVar(LLVMBuilderRef b, LLVMModuleRef mod,
+LLVMValueRef v_state, LLVMValueRef v_econtext,
+ExprEvalStep *op, LLVMValueRef v_slot);
 static LLVMValueRef create_LifetimeEnd(LLVMModuleRef mod);
 
 
@@ -403,14 +406,22 @@ llvm_compile_expr(ExprState *state)
 }
 
 			case EEOP_INNER_SYSVAR:
+build_ExecEvalSysVar(b, mod, v_state,
+	 v_econtext, op, v_innerslot);
+LLVMBuildBr(b, opblocks[i + 1]);
+break;
+
 			case EEOP_OUTER_SYSVAR:
+build_ExecEvalSysVar(b, mod, v_state,
+	 v_econtext, op, v_outerslot);
+LLVMBuildBr(b, opblocks[i + 1]);
+break;
+
 			case EEOP_SCAN_SYSVAR:
-{
-	build_EvalXFunc(b, mod, "ExecEvalSysVar",
-	v_state, v_econtext, op);
-	LLVMBuildBr(b, opblocks[i + 1]);
-	break;
-}
+build_ExecEvalSysVar(b, mod, v_state,
+		 v_econtext, op, v_scanslot);
+LLVMBuildBr(b, opblocks[i + 1]);
+break;
 
 			case EEOP_WHOLEROW:
 build_EvalXFunc(b, mod, "ExecEvalWholeRowVar",
@@ -2634,6 +2645,41 @@ build_EvalXFunc(LLVMBuilderRef b, LLVMModuleRef mod, const char *funcname,
   params, lengthof(params), "");
 }
 
+static void
+build_ExecEvalSysVar(LLVMBuilderRef b, LLVMModuleRef mod,
+	 LLVMValueRef v_state, LLVMValueRef v_econtext,
+	

Re: TupleTableSlot abstraction

2018-08-17 Thread Andres Freund
Hi,

On 2018-08-17 12:10:20 +0530, Ashutosh Bapat wrote:
> We need to add LLVM code to fetch tts_flags and
> perform bit operation on it to get or set slow property. I haven't
> found any precedence for LLVM bit operations in postgresql's JIT code.

There are several, look for the infomask accesses in
slot_compiler_deform.

I'll try to do the adaption later today.

Greetings,

Andres Freund



Re: TupleTableSlot abstraction

2018-08-09 Thread Ashutosh Bapat
On Wed, Aug 8, 2018 at 5:07 PM, Ashutosh Bapat
 wrote:

Amit Khandekar offlist told me that the previous patch-set doesn't
apply cleanly on the latest head.

PFA patches rebased on 31380bc7c204e7cfa9c9e1c62947909e2b38577c

>
> 3. compile with LLVM and fix any compilation and regression errors.

 When I compiled server with just 0003 applied with LLVM, the
 compilation went well, but there was a server crash. That patch
 changes type of tts_nvalid from int32 to AttrNumber. I tried debugging
 the crash with a debug LLVM build, but couldn't complete the work.
 Attached patch attrnumber_llvm_type.patch is my incomplete attempt to
 fix that crash. I think, we should make it easy to change the data
 types of the members in structures shared by JIT and non-JIT code, may
 be automatically create both copies of the code somehow. I will get
 back to this after addressing other TODOs.

>

still a TODO

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
<>


Re: TupleTableSlot abstraction

2018-08-06 Thread Ashutosh Bapat
On Mon, Aug 6, 2018 at 10:15 AM, Andres Freund  wrote:
> Hi,
>
> On 2018-08-06 10:08:24 +0530, Ashutosh Bapat wrote:
>> I think, I explained why getattr() needs to be a separate callback.
>> There's a reason why slot_getattr() more code than just calling
>> slot_getsomeattrs() and return the required one - the cases when the
>> requested attribute is NULL or is missing from a tuple.  Depending
>> upon the tuple layout access to a simple attribute can be optimized
>> without spending cycles to extract attributes prior to that one.
>> Columnar store (I haven't seen Haribabu's patch), for example, stores
>> columns from multiple rows together and thus doesn't have a compact
>> version of tuple. In such cases extracting an individual attribute
>> directly is far cheaper than extracting all the previous attributes.
>
> OTOH, it means we continually access the null bitmap instead of just
> tts_isnull[i].

Nope. The slot_getattr implementation in these patches as well as in
the current master return from tts_isnull if the array has the
information.

>
> Your logic about not deforming columns in this case would hold for *any*
> deforming of previous columns as well. That's an optimization that we
> probably want to implement at some point (e.g. by building a bitmap of
> needed columns in the planner), but I don't think we should do it
> together with this already large patchset.

Agree about optimizing using a separate bitmap indicating validity of
a particular value.

>
>
>> Why should we force the storage API to extract all the attributes in
>> such a case?
>
> Because there's no need yet, and it complicates the API without
> corresponding benefit.

The earlier version of slot_getattr() was optimized to access a given
attribute. That must have been done for some reason. Leaving that
optimization aside for this work will cause regression in the cases
where that optimization matters.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: TupleTableSlot abstraction

2018-08-05 Thread Andres Freund
Hi,

On 2018-08-06 10:08:24 +0530, Ashutosh Bapat wrote:
> I think, I explained why getattr() needs to be a separate callback.
> There's a reason why slot_getattr() more code than just calling
> slot_getsomeattrs() and return the required one - the cases when the
> requested attribute is NULL or is missing from a tuple.  Depending
> upon the tuple layout access to a simple attribute can be optimized
> without spending cycles to extract attributes prior to that one.
> Columnar store (I haven't seen Haribabu's patch), for example, stores
> columns from multiple rows together and thus doesn't have a compact
> version of tuple. In such cases extracting an individual attribute
> directly is far cheaper than extracting all the previous attributes.

OTOH, it means we continually access the null bitmap instead of just
tts_isnull[i].

Your logic about not deforming columns in this case would hold for *any*
deforming of previous columns as well. That's an optimization that we
probably want to implement at some point (e.g. by building a bitmap of
needed columns in the planner), but I don't think we should do it
together with this already large patchset.


> Why should we force the storage API to extract all the attributes in
> such a case?

Because there's no need yet, and it complicates the API without
corresponding benefit.

Greetings,

Andres Freund



Re: TupleTableSlot abstraction

2018-08-05 Thread Ashutosh Bapat
On Sun, Aug 5, 2018 at 3:49 PM, Andres Freund  wrote:
> Hi,
>
> Working on rebasing the pluggable storage patch on the current version
> of this.

Thanks. Please let me know if you see any issues.

>
> On 2018-07-26 17:09:08 +0530, Ashutosh Bapat wrote:
>> Done. I also noticed that slot_getattr() optimizes the cases when the
>> requested attributes is NULL or is missing from a tuple. Given that a
>> custom TupleTableSlot type can have its own optimizations for the
>> same, added a new call back getattr() to obtain value of a given
>> attribute from slot. The callback is called from slot_getattr().
>
> I'm quite against this. This is just proliferation of callbacks without
> much use.  Why do you think this is helpful?  I think it's much better
> to go back to a single callback to deform here.
>

I think, I explained why getattr() needs to be a separate callback.
There's a reason why slot_getattr() more code than just calling
slot_getsomeattrs() and return the required one - the cases when the
requested attribute is NULL or is missing from a tuple.  Depending
upon the tuple layout access to a simple attribute can be optimized
without spending cycles to extract attributes prior to that one.
Columnar store (I haven't seen Haribabu's patch), for example, stores
columns from multiple rows together and thus doesn't have a compact
version of tuple. In such cases extracting an individual attribute
directly is far cheaper than extracting all the previous attributes.
Why should we force the storage API to extract all the attributes in
such a case?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: TupleTableSlot abstraction

2018-08-05 Thread Andres Freund
Hi,

Working on rebasing the pluggable storage patch on the current version
of this.

On 2018-07-26 17:09:08 +0530, Ashutosh Bapat wrote:
> Done. I also noticed that slot_getattr() optimizes the cases when the
> requested attributes is NULL or is missing from a tuple. Given that a
> custom TupleTableSlot type can have its own optimizations for the
> same, added a new call back getattr() to obtain value of a given
> attribute from slot. The callback is called from slot_getattr().

I'm quite against this. This is just proliferation of callbacks without
much use.  Why do you think this is helpful?  I think it's much better
to go back to a single callback to deform here.

Greetings,

Andres Freund



Re: TupleTableSlot abstraction

2018-03-13 Thread Andres Freund
On 2018-03-13 15:18:34 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > Hi,
> > 
> > I've recently been discussing with Robert how to abstract
> > TupleTableSlots in the light of upcoming developments around abstracting
> > storage away.  Besides that aspect I think we also need to make them
> > more abstract to later deal with vectorized excution, but I'm more fuzzy
> > on the details there.

> Hi, is this patch proposed for pg11?

I wish, but I don't see it happening unless I get a time compression
device from somewhere :(

Greetings,

Andres Freund



Re: TupleTableSlot abstraction

2018-02-25 Thread Alexander Korotkov
Hi!

Thank you for working on this subject.  See some my comments below.

On Wed, Feb 21, 2018 at 1:43 AM, Andres Freund  wrote:

> - TupleTableSlots have to contain all the necessary information for each
>   type of slot. Some implementations might require a buffer pin to be
>   hold (our heap), others pointers to multiple underlying points of data
>   (columns store), some need additional metadata (our MinimalTuple).
>
>   Unifying the information into one struct makes it much harder to
>   provide differing implementations without modifying core postgres
>   and/or increasing overhead for every user of slots.
>
>   I think the consequence of that is that we need to allow each type of
>   slot to have their own TupleTableSlot embedding struct.
>

+1, I think that it's reasonable to keep minimal common slot information in
TupleTableSlot struct and to let particular slot implementations to extend
it.

  Haribabu's patch solved this by adding a tts_storage parameter that
>   contains additional data, but imo that's unconvincing due to the added
>   indirection in very performance critical codepaths.
>

+1

- The way slots currently are implemented each new variant data is
>   stored in slots adds new branches to hot code paths like
>   ExecClearTuple(), and they're not extensible.
>
>   Therefore I think we need to move to callback based implementation. In
>   my tests, by moving the callback invocations into very thin inline
>   functions, the branch prediction accuracy actually goes sligthly
>   *up*.
>

Sounds interesting.  Besides branch prediction accuracy, what can you
say about overall performance?

- Currently we frequently convert from one way of representing a row
>   into another, in the same slot. We do so fairly implicilty in
>   ExecMaterializeSlot(), ExecFetchSlotMinimalTuple(), ExecCopySlot()...
>
>   The more we abstract specific storage representation away, the worse I
>   think that is. I think we should make such conversions explicit by
>   requiring transfers between two slots if a specific type is required.
>
>
> - We have a few codepaths that set fields on tuples that can't be
>   represented in virtual slots. E.g. postgres_fdw sets ctid, oid,
>   ExecProcessReturning() will set tableoid.
>
>
> In my POC I turned TupleTableSlot into basically an abstract base class:
> struct TupleTableSlot
> {
> NodeTag type;
>
> uint16  flags;
> uint16  nvalid; /* # of valid values in tts_values
> */
>
> const TupleTableSlotOps *const cb;
>
> TupleDesc   tupleDescriptor;/* slot's tuple descriptor
> */
>
> Datum  *values; /* current per-attribute values */
> bool   *nulls;  /* current per-attribute null
> flags */
>
> /* can we optimize away? */
> MemoryContext mcxt; /* slot itself is in this context
> */
> };
>
> which then is inherited from for specific implementations of tuple
> slots:
>
> typedef struct HeapTupleTableSlot
> {
> TupleTableSlot base;
> HeapTuple   tuple;  /* physical tuple */
> uint32  off;/* saved state for
> slot_deform_tuple */
> } HeapTupleTableSlot;
>
> ...
> typedef struct MinimalTupleTableSlot
> {
> TupleTableSlot base;
> HeapTuple   tuple;  /* tuple wrapper */
> MinimalTuple mintuple;  /* minimal tuple, or NULL if none */
> HeapTupleData minhdr;   /* workspace for minimal-tuple-only case */
> uint32  off;/* saved state for
> slot_deform_tuple */
> } MinimalTupleTableSlot;
>
>
> typedef struct TupleTableSlotOps
> {
> void (*init)(TupleTableSlot *slot);
> void (*release)(TupleTableSlot *slot);
>
> void (*clear)(TupleTableSlot *slot);
>
> void (*getsomeattrs)(TupleTableSlot *slot, int natts);
>
> void (*materialize)(TupleTableSlot *slot);
>
> HeapTuple (*get_heap_tuple)(TupleTableSlot *slot);
> MinimalTuple (*get_minimal_tuple)(TupleTableSlot *slot);
>
> HeapTuple (*copy_heap_tuple)(TupleTableSlot *slot);
> MinimalTuple (*copy_minimal_tuple)(TupleTableSlot *slot);
>
> } TupleTableSlotOps;
>
> when creating a slot one has to specify the type of slot one wants to
> use:
>
> typedef enum TupleTableSlotType
> {
> TTS_TYPE_VIRTUAL,
> TTS_TYPE_HEAPTUPLE,
> TTS_TYPE_MINIMALTUPLE,
> TTS_TYPE_BUFFER
> } TupleTableSlotType;
>
> extern TupleTableSlot *MakeTupleTableSlot(TupleDesc desc,
> TupleTableSlotType st);
>

Can user define his own slot type?  If so, I assume that hard-coded
enum is a limitation of current prototype, and eventually we would have
an extendable array of slot types.  Is it correct?

You might notice that I've renamed a few fields (tts_values -> values,
> tts_isnull -> nulls, tts_nvalid -> nvalid) that haven't actually changed
> in the above structs / the attached