[ 
https://issues.apache.org/jira/browse/HAWQ-166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo updated HAWQ-166:
-----------------------------
    Attachment: olap_setup.sql

> Orphaned share input scan QE process after OLAP grouping query
> --------------------------------------------------------------
>
>                 Key: HAWQ-166
>                 URL: https://issues.apache.org/jira/browse/HAWQ-166
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Query Execution
>            Reporter: Ruilong Huo
>            Assignee: George Caragea
>         Attachments: olap_setup.sql
>
>
> Some OLAP grouping query may error out with "division by zero", and then 
> leaves orphaned share input scan QE process on segment hosts. It causes HAWQ 
> fails to shutdown, restart, etc.
> {code}
> [gpadmin@test2 ~]$ ps -ef | grep postgres
> gpadmin  115703      1  0 Nov15 ?        00:02:32 
> /data/pulse-agent-data/HAWQ-main-FeatureTest-dbg-mutilnode-huor/product/hawq-2.0.0.0/bin/postgres
>  -D 
> /data/pulse-agent-data/HAWQ-main-FeatureTest-dbg-mutilnode-huor/product/segmentdd
>  -i -M segment -p 21100 --silent-mode=true
> gpadmin  115704 115703  0 Nov15 ?        00:06:10 postgres: port 21100, 
> logger process
> gpadmin  115707 115703  0 Nov15 ?        00:00:01 postgres: port 21100, stats 
> collector process
> gpadmin  115708 115703  0 Nov15 ?        00:00:17 postgres: port 21100, 
> writer process
> gpadmin  115709 115703  0 Nov15 ?        00:00:03 postgres: port 21100, 
> checkpoint process
> gpadmin  115710 115703  0 Nov15 ?        00:00:38 postgres: port 21100, 
> segment resource manager
> gpadmin  118013 115703  0 Nov15 ?        00:00:28 postgres: port 21100, 
> gpadmin olap_group 10.32.35.205(46459) con58 seg1 cmd2 slice10 MPPEXEC SELECT
> gpadmin  118611 115703  0 Nov15 ?        00:00:28 postgres: port 21100, 
> gpadmin olap_group 10.32.35.205(47395) con63 seg5 cmd2 slice36 MPPEXEC SELECT
> gpadmin  173633 115703  0 Nov15 ?        00:00:28 postgres: port 21100, 
> gpadmin olap_group 10.32.35.205(3398) con616 seg3 cmd2 slice11 MPPEXEC SELECT
> gpadmin  180033 115703  0 Nov15 ?        00:00:28 postgres: port 21100, 
> gpadmin olap_group 10.32.35.205(13067) con676 seg1 cmd2 slice16 MPPEXEC SELECT
> gpadmin  223009 115703  0 Nov15 ?        00:00:27 postgres: port 21100, 
> gpadmin olap_group 10.32.35.205(13860) con1098 seg7 cmd2 slice21 MPPEXEC 
> SELECT
> gpadmin  414471 414449  0 17:46 pts/1    00:00:00 grep postgres
> [gpadmin@test2 ~]$ sudo gdb -p 173633
> (gdb) bt
> #0  0x0000003c1e8e15e3 in select () from /lib64/libc.so.6
> #1  0x000000000076724d in shareinput_writer_waitdone (ctxt=0x2b6cc10, 
> share_id=0, nsharer_xslice=4) at nodeShareInputScan.c:1011
> #2  0x000000000075b241 in ExecEndMaterial (node=0x2a98700) at 
> nodeMaterial.c:496
> #3  0x00000000007250b0 in ExecEndNode (node=0x2a98700) at execProcnode.c:1690
> #4  0x000000000076539c in ExecEndShareInputScan (node=0x2a98250) at 
> nodeShareInputScan.c:367
> #5  0x000000000072506a in ExecEndNode (node=0x2a98250) at execProcnode.c:1681
> #6  0x00000000007684e0 in ExecEndSort (node=0x2a97c20) at nodeSort.c:588
> #7  0x00000000007250f6 in ExecEndNode (node=0x2a97c20) at execProcnode.c:1696
> #8  0x000000000074b0c1 in ExecEndAgg (node=0x2a61df8) at nodeAgg.c:2644
> #9  0x0000000000725107 in ExecEndNode (node=0x2a61df8) at execProcnode.c:1700
> #10 0x00000000007639c1 in ExecEndRepeat (node=0x2a608e0) at nodeRepeat.c:218
> #11 0x000000000072517e in ExecEndNode (node=0x2a608e0) at execProcnode.c:1728
> #12 0x0000000000745ad5 in ExecEndAppend (node=0x2a603b8) at nodeAppend.c:411
> #13 0x0000000000724ec9 in ExecEndNode (node=0x2a603b8) at execProcnode.c:1582
> #14 0x0000000000745ad5 in ExecEndAppend (node=0x2a5ff08) at nodeAppend.c:411
> #15 0x0000000000724ec9 in ExecEndNode (node=0x2a5ff08) at execProcnode.c:1582
> #16 0x0000000000760997 in ExecEndMotion (node=0x2a79950) at nodeMotion.c:1232
> #17 0x000000000072516d in ExecEndNode (node=0x2a79950) at execProcnode.c:1724
> #18 0x000000000071c9e9 in ExecEndPlan (planstate=0x2a79950, estate=0x29d4318) 
> at execMain.c:2891
> #19 0x0000000000719952 in ExecutorEnd (queryDesc=0x29cc208) at execMain.c:1365
> #20 0x00000000006b452b in PortalCleanupHelper (portal=0x29e2528) at 
> portalcmds.c:347
> #21 0x00000000006b4454 in PortalCleanup (portal=0x29e2528) at portalcmds.c:299
> #22 0x0000000000a1c8ca in AtAbort_Portals () at portalmem.c:677
> #23 0x0000000000521b9b in AbortTransaction () at xact.c:2819
> #24 0x0000000000522753 in AbortCurrentTransaction () at xact.c:3332
> #25 0x00000000008fc43d in PostgresMain (argc=276, argv=0x2912038, 
> username=0x28f57f8 "gpadmin") at postgres.c:4469
> #26 0x000000000089f48a in BackendRun (port=0x28bde30) at postmaster.c:5844
> #27 0x000000000089e914 in BackendStartup (port=0x28bde30) at postmaster.c:5437
> #28 0x0000000000899032 in ServerLoop () at postmaster.c:2139
> #29 0x0000000000898108 in PostmasterMain (argc=9, argv=0x28a6130) at 
> postmaster.c:1431
> #30 0x00000000007b199a in main (argc=9, argv=0x28a6130) at main.c:226
> {code}
> The reproduction steps are:
> {code}
> Step 1: Prepare schema and data using attached olap_setup.sql
> Step 2: Run below OLAP grouping query
> -- OLAP query involving MAX() function
> SELECT sale.vn,sale.cn,sale.dt,GROUPING(sale.vn), 
> TO_CHAR(COALESCE(MAX(DISTINCT 
> floor(sale.vn+sale.qty)),0),'99999999.9999999'),TO_CHAR(COALESCE(VAR_SAMP(floor(sale.pn/sale.prc)),0),'99999999.9999999'),TO_CHAR(COALESCE(COUNT(floor(sale.qty+sale.prc)),0),'99999999.9999999')
> FROM sale,customer,vendor
> WHERE sale.cn=customer.cn AND sale.vn=vendor.vn
> GROUP BY 
> ROLLUP((sale.prc),(sale.vn,sale.vn),(sale.pn,sale.pn),(sale.dt),(sale.qty,sale.vn,sale.qty)),ROLLUP((sale.pn),(sale.vn,sale.pn),(sale.qty)),(),sale.cn
>  HAVING COALESCE(VAR_POP(sale.cn),0) >= 45.5839785564113;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to