Re: [HACKERS] Parallel Bitmap Heap Scans segfaults due to (tbm->dsa==NULL) on PostgreSQL 10

2017-10-13 Thread Robert Haas
On Thu, Oct 12, 2017 at 9:14 AM, Dilip Kumar  wrote:
>> Yep, this fixes the failures for me.
>>
> Thanks for confirming.

Committed and back-patched to v10.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Bitmap Heap Scans segfaults due to (tbm->dsa==NULL) on PostgreSQL 10

2017-10-12 Thread Dilip Kumar
On Thu, Oct 12, 2017 at 6:37 PM, Tomas Vondra
 wrote:
>
>
> On 10/12/2017 02:40 PM, Dilip Kumar wrote:
>> On Thu, Oct 12, 2017 at 4:31 PM, Tomas Vondra
>>  wrote:
>>> Hi,
>>>
>>> It seems that Q19 from TPC-H is consistently failing with segfaults due
>>> to calling tbm_prepare_shared_iterate() with (tbm->dsa==NULL).
>>>
>>> I'm not very familiar with how the dsa is initialized and passed around,
>>> but I only see the failures when the bitmap is constructed by a mix of
>>> BitmapAnd and BitmapOr operations.
>>>
>> I think I have got the issue, bitmap_subplan_mark_shared is not
>> properly pushing the isshared flag to lower level bitmap index node,
>> and because of that tbm_create is passing NULL dsa while creating the
>> tidbitmap.  So this problem will come in very specific combination of
>> BitmapOr and BitmapAnd when BitmapAnd is the first subplan for the
>> BitmapOr.  If BitmapIndex is the first subplan under BitmapOr then
>> there is no problem because BitmapOr node will create the tbm by
>> itself and isshared is set for BitmapOr.
>>
>> Attached patch fixing the issue for me.  I will thoroughly test this
>> patch with other scenario as well.  Thanks for reporting.
>>
>
> Yep, this fixes the failures for me.
>
Thanks for confirming.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Bitmap Heap Scans segfaults due to (tbm->dsa==NULL) on PostgreSQL 10

2017-10-12 Thread Tomas Vondra


On 10/12/2017 02:40 PM, Dilip Kumar wrote:
> On Thu, Oct 12, 2017 at 4:31 PM, Tomas Vondra
>  wrote:
>> Hi,
>>
>> It seems that Q19 from TPC-H is consistently failing with segfaults due
>> to calling tbm_prepare_shared_iterate() with (tbm->dsa==NULL).
>>
>> I'm not very familiar with how the dsa is initialized and passed around,
>> but I only see the failures when the bitmap is constructed by a mix of
>> BitmapAnd and BitmapOr operations.
>>
> I think I have got the issue, bitmap_subplan_mark_shared is not
> properly pushing the isshared flag to lower level bitmap index node,
> and because of that tbm_create is passing NULL dsa while creating the
> tidbitmap.  So this problem will come in very specific combination of
> BitmapOr and BitmapAnd when BitmapAnd is the first subplan for the
> BitmapOr.  If BitmapIndex is the first subplan under BitmapOr then
> there is no problem because BitmapOr node will create the tbm by
> itself and isshared is set for BitmapOr.
> 
> Attached patch fixing the issue for me.  I will thoroughly test this
> patch with other scenario as well.  Thanks for reporting.
> 

Yep, this fixes the failures for me.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Bitmap Heap Scans segfaults due to (tbm->dsa==NULL) on PostgreSQL 10

2017-10-12 Thread Dilip Kumar
On Thu, Oct 12, 2017 at 4:31 PM, Tomas Vondra
 wrote:
> Hi,
>
> It seems that Q19 from TPC-H is consistently failing with segfaults due
> to calling tbm_prepare_shared_iterate() with (tbm->dsa==NULL).
>
> I'm not very familiar with how the dsa is initialized and passed around,
> but I only see the failures when the bitmap is constructed by a mix of
> BitmapAnd and BitmapOr operations.
>
I think I have got the issue, bitmap_subplan_mark_shared is not
properly pushing the isshared flag to lower level bitmap index node,
and because of that tbm_create is passing NULL dsa while creating the
tidbitmap.  So this problem will come in very specific combination of
BitmapOr and BitmapAnd when BitmapAnd is the first subplan for the
BitmapOr.  If BitmapIndex is the first subplan under BitmapOr then
there is no problem because BitmapOr node will create the tbm by
itself and isshared is set for BitmapOr.

Attached patch fixing the issue for me.  I will thoroughly test this
patch with other scenario as well.  Thanks for reporting.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 5c934f2..cc7590b 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -4922,7 +4922,11 @@ bitmap_subplan_mark_shared(Plan *plan)
 		bitmap_subplan_mark_shared(
    linitial(((BitmapAnd *) plan)->bitmapplans));
 	else if (IsA(plan, BitmapOr))
+	{
 		((BitmapOr *) plan)->isshared = true;
+		bitmap_subplan_mark_shared(
+  linitial(((BitmapOr *) plan)->bitmapplans));
+	}
 	else if (IsA(plan, BitmapIndexScan))
 		((BitmapIndexScan *) plan)->isshared = true;
 	else

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Parallel Bitmap Heap Scans segfaults due to (tbm->dsa==NULL) on PostgreSQL 10

2017-10-12 Thread Tomas Vondra
Hi,

It seems that Q19 from TPC-H is consistently failing with segfaults due
to calling tbm_prepare_shared_iterate() with (tbm->dsa==NULL).

I'm not very familiar with how the dsa is initialized and passed around,
but I only see the failures when the bitmap is constructed by a mix of
BitmapAnd and BitmapOr operations.

Another interesting observation is that setting force_parallel_mode=on
may not be enough - there really need to be multiple parallel workers,
which is why the simple query does cpu_tuple_cost=1.

Attached is a bunch of files:

1) details for "full" query:

* query.sql
* plan.txt
* backtrace.txt

2) details for the "minimal" query triggering the issue:

* query-minimal.sql
* plan-minimal.txt
* backtrace-minimal.txt



regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Program terminated with signal 6, Aborted.
#0  0x7fe21265d1f7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64
(gdb) bt
#0  0x7fe21265d1f7 in raise () from /lib64/libc.so.6
#1  0x7fe21265e8e8 in abort () from /lib64/libc.so.6
#2  0x008468e7 in ExceptionalCondition 
(conditionName=conditionName@entry=0x9d213a "!(tbm->dsa != ((void *)0))", 
errorType=errorType@entry=0x88fc69 "FailedAssertion", 
fileName=fileName@entry=0x9d2014 "tidbitmap.c", 
lineNumber=lineNumber@entry=800) at assert.c:54
#3  0x0065b04f in tbm_prepare_shared_iterate (tbm=tbm@entry=0x2b244e8) 
at tidbitmap.c:800
#4  0x0062294a in BitmapHeapNext (node=node@entry=0x2adf118) at 
nodeBitmapHeapscan.c:155
#5  0x00616d7a in ExecScanFetch (recheckMtd=0x623050 
, accessMtd=0x622250 , node=0x2adf118) at 
execScan.c:97
#6  ExecScan (node=0x2adf118, accessMtd=0x622250 , 
recheckMtd=0x623050 ) at execScan.c:147
#7  0x00624c75 in ExecProcNode (node=0x2adf118) at 
../../../src/include/executor/executor.h:250
#8  gather_getnext (gatherstate=0x2aded50) at nodeGather.c:281
#9  ExecGather (pstate=0x2aded50) at nodeGather.c:215
#10 0x00610d12 in ExecProcNode (node=0x2aded50) at 
../../../src/include/executor/executor.h:250
#11 ExecutePlan (execute_once=, dest=0x2b09220, 
direction=, numberTuples=0, sendTuples=1 '\001', 
operation=CMD_SELECT, use_parallel_mode=, planstate=0x2aded50, 
estate=0x2adeb00) at execMain.c:1721
#12 standard_ExecutorRun (queryDesc=0x2a3bdf0, direction=, 
count=0, execute_once=) at execMain.c:363
#13 0x0074b50b in PortalRunSelect (portal=portal@entry=0x2a34050, 
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, 
dest=dest@entry=0x2b09220) at pquery.c:932
#14 0x0074ca18 in PortalRun (portal=portal@entry=0x2a34050, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
run_once=run_once@entry=1 '\001', dest=dest@entry=0x2b09220, 
altdest=altdest@entry=0x2b09220, 
completionTag=completionTag@entry=0x7ffc8dad21c0 "") at pquery.c:773
#15 0x0074875b in exec_simple_query (
query_string=0x2a96ff0 "select\n*\nfrom\npart\nwhere\n(\n   
 p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')\nand p_size 
between 1 and 5\n)\nor\n(\np_container in ('MED BAG', 'MED 
B"...) at postgres.c:1099
#16 0x00749a03 in PostgresMain (argc=, 
argv=argv@entry=0x2a44048, dbname=0x2a43eb0 "test", username=) 
at postgres.c:4088
#17 0x0047665f in BackendRun (port=0x2a37cc0) at postmaster.c:4357
#18 BackendStartup (port=0x2a37cc0) at postmaster.c:4029
#19 ServerLoop () at postmaster.c:1753
#20 0x006d70d9 in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x2a14b20) at postmaster.c:1361
#21 0x004774c1 in main (argc=3, argv=0x2a14b20) at main.c:228

   QUERY PLAN   



 Gather
   Workers Planned: 2
   ->  Parallel Bitmap Heap Scan on part
 Recheck Cond: (((p_size <= 5) AND (p_size >= 1) AND (p_container = ANY 
('{"SM CASE","SM BOX","SM PACK","SM PKG"}'::bpchar[]))) OR ((p_container = ANY 
('{"MED BAG","MED BOX","MED PKG","MED PACK"}'::bpchar[])) AND (p_size <= 10) 
AND (p_size >= 1)))
 ->  BitmapOr
   ->  BitmapAnd
 ->  Bitmap Index Scan on part_p_size_idx
   Index Cond: ((p_size <= 5) AND (p_size >= 1))
 ->  Bitmap Index Scan on 
part_p_container_p_brand_p_partkey_idx
   Index Cond: (p_container = ANY ('{"SM