[ https://issues.apache.org/jira/browse/HAWQ-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amy reassigned HAWQ-1371: ------------------------- Assignee: Amy (was: Lei Chang) > QE process hang in shared input scan > ------------------------------------ > > Key: HAWQ-1371 > URL: https://issues.apache.org/jira/browse/HAWQ-1371 > Project: Apache HAWQ > Issue Type: Bug > Components: Query Execution > Reporter: Amy > Assignee: Amy > Fix For: backlog > > > process hang on some segment node while QD and QE on other segment nodes > terminated. > {code} > on segment test2: > [gpadmin@test2 ~]$ pp > gpadmin 21614 0.0 1.2 788636 407428 ? Ss Feb26 1:19 > /usr/local/hawq_2_1_0_0/bin/postgres -D > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-YARN/product/segmentdd -p > 31100 --silent-mode=true -M segment -i > gpadmin 21615 0.0 0.0 279896 6952 ? Ss Feb26 0:08 postgres: > port 31100, logger process > gpadmin 21618 0.0 0.0 282128 6980 ? Ss Feb26 0:00 postgres: > port 31100, stats collector process > gpadmin 21619 0.0 0.0 788636 7280 ? Ss Feb26 0:11 postgres: > port 31100, writer process > gpadmin 21620 0.0 0.0 788636 7064 ? Ss Feb26 0:01 postgres: > port 31100, checkpoint process > gpadmin 21621 0.0 0.0 793048 11752 ? S Feb26 0:19 postgres: > port 31100, segment resource manager > gpadmin 91760 0.0 0.0 861000 16840 ? TNsl Feb26 0:07 postgres: > port 31100, gpadmin parquetola... 10.32.35.141(15250) con558 seg4 cmd2 > slice11 MPPEXEC SELECT > gpadmin 91762 0.0 0.0 861064 17116 ? SNsl Feb26 0:08 postgres: > port 31100, gpadmin parquetola... 10.32.35.141(15253) con558 seg5 cmd2 > slice11 MPPEXEC SELECT > gpadmin 216648 0.0 0.0 103244 788 pts/0 S+ 19:54 0:00 grep > postgres > {code} > QE stack trace is: > {code} > (gdb) bt > #0 0x00000032214e1523 in select () from /lib64/libc.so.6 > #1 0x000000000069c2fa in shareinput_writer_waitdone (ctxt=0x1dae520, > share_id=0, nsharer_xslice=7) at nodeShareInputScan.c:989 > #2 0x0000000000695798 in ExecEndMaterial (node=0x1d2eb50) at > nodeMaterial.c:512 > #3 0x000000000067048d in ExecEndNode (node=0x1d2eb50) at execProcnode.c:1681 > #4 0x000000000069c6b5 in ExecEndShareInputScan (node=0x1d2e6f0) at > nodeShareInputScan.c:382 > #5 0x000000000067042a in ExecEndNode (node=0x1d2e6f0) at execProcnode.c:1674 > #6 0x00000000006ac9be in ExecEndSequence (node=0x1d23890) at > nodeSequence.c:165 > #7 0x00000000006705f0 in ExecEndNode (node=0x1d23890) at execProcnode.c:1583 > #8 0x000000000069a0ab in ExecEndResult (node=0x1d214a0) at nodeResult.c:481 > #9 0x000000000067060d in ExecEndNode (node=0x1d214a0) at execProcnode.c:1575 > #10 0x000000000069a0ab in ExecEndResult (node=0x1d20860) at nodeResult.c:481 > #11 0x000000000067060d in ExecEndNode (node=0x1d20860) at execProcnode.c:1575 > #12 0x0000000000698fd2 in ExecEndMotion (node=0x1d20320) at nodeMotion.c:1230 > #13 0x0000000000670434 in ExecEndNode (node=0x1d20320) at execProcnode.c:1713 > #14 0x0000000000669da7 in ExecEndPlan (planstate=0x1d20320, estate=0x1cb6b40) > at execMain.c:2896 > #15 0x000000000066a311 in ExecutorEnd (queryDesc=0x1cabf20) at execMain.c:1407 > #16 0x00000000006195f2 in PortalCleanupHelper (portal=0x1cbcc40) at > portalcmds.c:365 > #17 PortalCleanup (portal=0x1cbcc40) at portalcmds.c:317 > #18 0x0000000000900544 in AtAbort_Portals () at portalmem.c:693 > #19 0x00000000004e697f in AbortTransaction () at xact.c:2800 > #20 0x00000000004e7565 in AbortCurrentTransaction () at xact.c:3377 > #21 0x00000000007ed0fa in PostgresMain (argc=<value optimized out>, > argv=<value optimized out>, username=0x1b47f10 "gpadmin") at postgres.c:4630 > #22 0x00000000007a05d0 in BackendRun () at postmaster.c:5915 > #23 BackendStartup () at postmaster.c:5484 > #24 ServerLoop () at postmaster.c:2163 > #25 0x00000000007a3399 in PostmasterMain (argc=Unhandled dwarf expression > opcode 0xf3 > ) at postmaster.c:1454 > #26 0x00000000004a52e9 in main (argc=9, argv=0x1b0cd10) at main.c:226 > (gdb) p CurrentTransactionState->state > $1 = TRANS_ABORT > (gdb) p pctxt->donefd > No symbol "pctxt" in current context. > (gdb) f 1 > #1 0x000000000069c2fa in shareinput_writer_waitdone (ctxt=0x1dae520, > share_id=0, nsharer_xslice=7) at nodeShareInputScan.c:989 > 989 nodeShareInputScan.c: No such file or directory. > in nodeShareInputScan.c > (gdb) p pctxt->donefd > $2 = 15 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)