[jira] [Commented] (HAWQ-564) QD hangs when connecting to resource manager

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225460#comment-15225460
 ] 

ASF GitHub Bot commented on HAWQ-564:
-

Github user yaoj2 commented on the pull request:

https://github.com/apache/incubator-hawq/pull/550#issuecomment-205573503
  
+1


> QD hangs when connecting to resource manager
> 
>
> Key: HAWQ-564
> URL: https://issues.apache.org/jira/browse/HAWQ-564
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Resource Manager
>Affects Versions: 2.0.0
>Reporter: Chunling Wang
>Assignee: Yi Jin
>
> When first inject panic in QE process, we run a query and segment is down. 
> After the segment is up, we run another query and get correct answer. Then we 
> inject the same panic second time. After the segment is down and then up 
> again, we run a query and find QD process hangs when connecting to resource 
> manager. Here is the backtrace when QD hangs:
> {code}
> * thread #1: tid = 0x21d8be, 0x7fff890355be libsystem_kernel.dylib`poll + 
> 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>   * frame #0: 0x7fff890355be libsystem_kernel.dylib`poll + 10
> frame #1: 0x000101daeafe postgres`processAllCommFileDescs + 158 at 
> rmcomm_AsyncComm.c:156
> frame #2: 0x000101db85f5 
> postgres`callSyncRPCRemote(hostname=0x7f9c19e00cd0, port=5437, 
> sendbuff=0x7f9c1b918f50, sendbuffsize=80, sendmsgid=259, 
> exprecvmsgid=2307, recvsmb=, errorbuf=0x00010230c1a0, 
> errorbufsize=) + 645 at rmcomm_SyncComm.c:122
> frame #3: 0x000101db2d85 postgres`acquireResourceFromRM [inlined] 
> callSyncRPCToRM(sendbuff=0x7f9c1b918f50, sendbuffsize=, 
> sendmsgid=259, exprecvmsgid=2307, recvsmb=0x7f9c1b918e70, 
> errorbuf=, errorbufsize=1024) + 73 at rmcomm_QD2RM.c:2780
> frame #4: 0x000101db2d3c 
> postgres`acquireResourceFromRM(index=, sessionid=12, 
> slice_size=462524016, iobytes=134217728, preferred_nodes=0x7f9c1a02d398, 
> preferred_nodes_size=, max_seg_count_fix=, 
> min_seg_count_fix=, errorbuf=, 
> errorbufsize=) + 572 at rmcomm_QD2RM.c:742
> frame #5: 0x000101c979e7 postgres`AllocateResource(life=QRL_ONCE, 
> slice_size=5, iobytes=134217728, max_target_segment_num=1, 
> min_target_segment_num=1, vol_info=0x7f9c1a02d398, vol_info_size=1) + 631 
> at pquery.c:796
> frame #6: 0x000101e8c60f 
> postgres`calculate_planner_segment_num(query=, 
> resourceLife=QRL_ONCE, fullRangeTable=, 
> intoPolicy=, sliceNum=5) + 14287 at cdbdatalocality.c:4207
> frame #7: 0x000101c0f671 postgres`planner + 106 at planner.c:496
> frame #8: 0x000101c0f607 postgres`planner(parse=0x7f9c1a02a140, 
> cursorOptions=, boundParams=0x, 
> resourceLife=QRL_ONCE) + 311 at planner.c:310
> frame #9: 0x000101c8eb33 
> postgres`pg_plan_query(querytree=0x7f9c1a02a140, 
> boundParams=0x, resource_life=QRL_ONCE) + 99 at postgres.c:837
> frame #10: 0x000101c956ae postgres`exec_simple_query + 21 at 
> postgres.c:911
> frame #11: 0x000101c95699 
> postgres`exec_simple_query(query_string=0x7f9c1a028a30, 
> seqServerHost=0x, seqServerPort=-1) + 1577 at postgres.c:1671
> frame #12: 0x000101c91a4c postgres`PostgresMain(argc=, 
> argv=, username=0x7f9c1b808cf0) + 9404 at postgres.c:4754
> frame #13: 0x000101c4ae02 postgres`ServerLoop [inlined] BackendRun + 
> 105 at postmaster.c:5889
> frame #14: 0x000101c4ad99 postgres`ServerLoop at postmaster.c:5484
> frame #15: 0x000101c4ad99 postgres`ServerLoop + 9593 at 
> postmaster.c:2163
> frame #16: 0x000101c47d3b postgres`PostmasterMain(argc=, 
> argv=) + 5019 at postmaster.c:1454
> frame #17: 0x000101bb1aa9 postgres`main(argc=9, 
> argv=0x7f9c19c1eef0) + 1433 at main.c:209
> frame #18: 0x7fff95e8c5c9 libdyld.dylib`start + 1
>   thread #2: tid = 0x21d8bf, 0x7fff890355be libsystem_kernel.dylib`poll + 
> 10
> frame #0: 0x7fff890355be libsystem_kernel.dylib`poll + 10
> frame #1: 0x000101dfe723 postgres`rxThreadFunc(arg=) + 
> 2163 at ic_udp.c:6251
> frame #2: 0x7fff95e822fc libsystem_pthread.dylib`_pthread_body + 131
> frame #3: 0x7fff95e82279 libsystem_pthread.dylib`_pthread_start + 176
> frame #4: 0x7fff95e804b1 libsystem_pthread.dylib`thread_start + 13
>   thread #3: tid = 0x21d9c2, 0x7fff890343f6 
> libsystem_kernel.dylib`__select + 10
> frame #0: 0x7fff890343f6 libsystem_kernel.dylib`__select + 10
> frame #1: 0x000101e9d42e postgres`pg_usleep(microsec=) + 
> 78 at pgsleep.c:43
> frame #2: 0x000101db1a66 
> postgres`generateResourceRefreshHeartBeat(arg=0x7f9c19f02480) + 166 at 
> rmcomm_QD2RM.c:1519
> frame #3: 

[jira] [Commented] (HAWQ-564) QD hangs when connecting to resource manager

2016-04-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223619#comment-15223619
 ] 

ASF GitHub Bot commented on HAWQ-564:
-

Github user linwen commented on the pull request:

https://github.com/apache/incubator-hawq/pull/550#issuecomment-205111616
  
+1


> QD hangs when connecting to resource manager
> 
>
> Key: HAWQ-564
> URL: https://issues.apache.org/jira/browse/HAWQ-564
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Resource Manager
>Affects Versions: 2.0.0
>Reporter: Chunling Wang
>Assignee: Yi Jin
>
> When first inject panic in QE process, we run a query and segment is down. 
> After the segment is up, we run another query and get correct answer. Then we 
> inject the same panic second time. After the segment is down and then up 
> again, we run a query and find QD process hangs when connecting to resource 
> manager. Here is the backtrace when QD hangs:
> {code}
> * thread #1: tid = 0x21d8be, 0x7fff890355be libsystem_kernel.dylib`poll + 
> 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>   * frame #0: 0x7fff890355be libsystem_kernel.dylib`poll + 10
> frame #1: 0x000101daeafe postgres`processAllCommFileDescs + 158 at 
> rmcomm_AsyncComm.c:156
> frame #2: 0x000101db85f5 
> postgres`callSyncRPCRemote(hostname=0x7f9c19e00cd0, port=5437, 
> sendbuff=0x7f9c1b918f50, sendbuffsize=80, sendmsgid=259, 
> exprecvmsgid=2307, recvsmb=, errorbuf=0x00010230c1a0, 
> errorbufsize=) + 645 at rmcomm_SyncComm.c:122
> frame #3: 0x000101db2d85 postgres`acquireResourceFromRM [inlined] 
> callSyncRPCToRM(sendbuff=0x7f9c1b918f50, sendbuffsize=, 
> sendmsgid=259, exprecvmsgid=2307, recvsmb=0x7f9c1b918e70, 
> errorbuf=, errorbufsize=1024) + 73 at rmcomm_QD2RM.c:2780
> frame #4: 0x000101db2d3c 
> postgres`acquireResourceFromRM(index=, sessionid=12, 
> slice_size=462524016, iobytes=134217728, preferred_nodes=0x7f9c1a02d398, 
> preferred_nodes_size=, max_seg_count_fix=, 
> min_seg_count_fix=, errorbuf=, 
> errorbufsize=) + 572 at rmcomm_QD2RM.c:742
> frame #5: 0x000101c979e7 postgres`AllocateResource(life=QRL_ONCE, 
> slice_size=5, iobytes=134217728, max_target_segment_num=1, 
> min_target_segment_num=1, vol_info=0x7f9c1a02d398, vol_info_size=1) + 631 
> at pquery.c:796
> frame #6: 0x000101e8c60f 
> postgres`calculate_planner_segment_num(query=, 
> resourceLife=QRL_ONCE, fullRangeTable=, 
> intoPolicy=, sliceNum=5) + 14287 at cdbdatalocality.c:4207
> frame #7: 0x000101c0f671 postgres`planner + 106 at planner.c:496
> frame #8: 0x000101c0f607 postgres`planner(parse=0x7f9c1a02a140, 
> cursorOptions=, boundParams=0x, 
> resourceLife=QRL_ONCE) + 311 at planner.c:310
> frame #9: 0x000101c8eb33 
> postgres`pg_plan_query(querytree=0x7f9c1a02a140, 
> boundParams=0x, resource_life=QRL_ONCE) + 99 at postgres.c:837
> frame #10: 0x000101c956ae postgres`exec_simple_query + 21 at 
> postgres.c:911
> frame #11: 0x000101c95699 
> postgres`exec_simple_query(query_string=0x7f9c1a028a30, 
> seqServerHost=0x, seqServerPort=-1) + 1577 at postgres.c:1671
> frame #12: 0x000101c91a4c postgres`PostgresMain(argc=, 
> argv=, username=0x7f9c1b808cf0) + 9404 at postgres.c:4754
> frame #13: 0x000101c4ae02 postgres`ServerLoop [inlined] BackendRun + 
> 105 at postmaster.c:5889
> frame #14: 0x000101c4ad99 postgres`ServerLoop at postmaster.c:5484
> frame #15: 0x000101c4ad99 postgres`ServerLoop + 9593 at 
> postmaster.c:2163
> frame #16: 0x000101c47d3b postgres`PostmasterMain(argc=, 
> argv=) + 5019 at postmaster.c:1454
> frame #17: 0x000101bb1aa9 postgres`main(argc=9, 
> argv=0x7f9c19c1eef0) + 1433 at main.c:209
> frame #18: 0x7fff95e8c5c9 libdyld.dylib`start + 1
>   thread #2: tid = 0x21d8bf, 0x7fff890355be libsystem_kernel.dylib`poll + 
> 10
> frame #0: 0x7fff890355be libsystem_kernel.dylib`poll + 10
> frame #1: 0x000101dfe723 postgres`rxThreadFunc(arg=) + 
> 2163 at ic_udp.c:6251
> frame #2: 0x7fff95e822fc libsystem_pthread.dylib`_pthread_body + 131
> frame #3: 0x7fff95e82279 libsystem_pthread.dylib`_pthread_start + 176
> frame #4: 0x7fff95e804b1 libsystem_pthread.dylib`thread_start + 13
>   thread #3: tid = 0x21d9c2, 0x7fff890343f6 
> libsystem_kernel.dylib`__select + 10
> frame #0: 0x7fff890343f6 libsystem_kernel.dylib`__select + 10
> frame #1: 0x000101e9d42e postgres`pg_usleep(microsec=) + 
> 78 at pgsleep.c:43
> frame #2: 0x000101db1a66 
> postgres`generateResourceRefreshHeartBeat(arg=0x7f9c19f02480) + 166 at 
> rmcomm_QD2RM.c:1519
> frame #3: 

[jira] [Commented] (HAWQ-564) QD hangs when connecting to resource manager

2016-04-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223220#comment-15223220
 ] 

ASF GitHub Bot commented on HAWQ-564:
-

GitHub user jiny2 opened a pull request:

https://github.com/apache/incubator-hawq/pull/550

HAWQ-564. Resume resource dispatching when reset a RUAlive pending segment



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jiny2/incubator-hawq HAWQ-564-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq/pull/550.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #550


commit 332eff11ff1ec11334ae0b8da2b7346a809ae4c3
Author: YI JIN 
Date:   2016-04-03T11:34:03Z

HAWQ-564. Resume resource dispatching when reset a RUAlive pending segment




> QD hangs when connecting to resource manager
> 
>
> Key: HAWQ-564
> URL: https://issues.apache.org/jira/browse/HAWQ-564
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Resource Manager
>Affects Versions: 2.0.0
>Reporter: Chunling Wang
>Assignee: Yi Jin
>
> When first inject panic in QE process, we run a query and segment is down. 
> After the segment is up, we run another query and get correct answer. Then we 
> inject the same panic second time. After the segment is down and then up 
> again, we run a query and find QD process hangs when connecting to resource 
> manager. Here is the backtrace when QD hangs:
> {code}
> * thread #1: tid = 0x21d8be, 0x7fff890355be libsystem_kernel.dylib`poll + 
> 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>   * frame #0: 0x7fff890355be libsystem_kernel.dylib`poll + 10
> frame #1: 0x000101daeafe postgres`processAllCommFileDescs + 158 at 
> rmcomm_AsyncComm.c:156
> frame #2: 0x000101db85f5 
> postgres`callSyncRPCRemote(hostname=0x7f9c19e00cd0, port=5437, 
> sendbuff=0x7f9c1b918f50, sendbuffsize=80, sendmsgid=259, 
> exprecvmsgid=2307, recvsmb=, errorbuf=0x00010230c1a0, 
> errorbufsize=) + 645 at rmcomm_SyncComm.c:122
> frame #3: 0x000101db2d85 postgres`acquireResourceFromRM [inlined] 
> callSyncRPCToRM(sendbuff=0x7f9c1b918f50, sendbuffsize=, 
> sendmsgid=259, exprecvmsgid=2307, recvsmb=0x7f9c1b918e70, 
> errorbuf=, errorbufsize=1024) + 73 at rmcomm_QD2RM.c:2780
> frame #4: 0x000101db2d3c 
> postgres`acquireResourceFromRM(index=, sessionid=12, 
> slice_size=462524016, iobytes=134217728, preferred_nodes=0x7f9c1a02d398, 
> preferred_nodes_size=, max_seg_count_fix=, 
> min_seg_count_fix=, errorbuf=, 
> errorbufsize=) + 572 at rmcomm_QD2RM.c:742
> frame #5: 0x000101c979e7 postgres`AllocateResource(life=QRL_ONCE, 
> slice_size=5, iobytes=134217728, max_target_segment_num=1, 
> min_target_segment_num=1, vol_info=0x7f9c1a02d398, vol_info_size=1) + 631 
> at pquery.c:796
> frame #6: 0x000101e8c60f 
> postgres`calculate_planner_segment_num(query=, 
> resourceLife=QRL_ONCE, fullRangeTable=, 
> intoPolicy=, sliceNum=5) + 14287 at cdbdatalocality.c:4207
> frame #7: 0x000101c0f671 postgres`planner + 106 at planner.c:496
> frame #8: 0x000101c0f607 postgres`planner(parse=0x7f9c1a02a140, 
> cursorOptions=, boundParams=0x, 
> resourceLife=QRL_ONCE) + 311 at planner.c:310
> frame #9: 0x000101c8eb33 
> postgres`pg_plan_query(querytree=0x7f9c1a02a140, 
> boundParams=0x, resource_life=QRL_ONCE) + 99 at postgres.c:837
> frame #10: 0x000101c956ae postgres`exec_simple_query + 21 at 
> postgres.c:911
> frame #11: 0x000101c95699 
> postgres`exec_simple_query(query_string=0x7f9c1a028a30, 
> seqServerHost=0x, seqServerPort=-1) + 1577 at postgres.c:1671
> frame #12: 0x000101c91a4c postgres`PostgresMain(argc=, 
> argv=, username=0x7f9c1b808cf0) + 9404 at postgres.c:4754
> frame #13: 0x000101c4ae02 postgres`ServerLoop [inlined] BackendRun + 
> 105 at postmaster.c:5889
> frame #14: 0x000101c4ad99 postgres`ServerLoop at postmaster.c:5484
> frame #15: 0x000101c4ad99 postgres`ServerLoop + 9593 at 
> postmaster.c:2163
> frame #16: 0x000101c47d3b postgres`PostmasterMain(argc=, 
> argv=) + 5019 at postmaster.c:1454
> frame #17: 0x000101bb1aa9 postgres`main(argc=9, 
> argv=0x7f9c19c1eef0) + 1433 at main.c:209
> frame #18: 0x7fff95e8c5c9 libdyld.dylib`start + 1
>   thread #2: tid = 0x21d8bf, 0x7fff890355be libsystem_kernel.dylib`poll + 
> 10
> frame #0: 0x7fff890355be libsystem_kernel.dylib`poll + 10
> frame #1: 0x000101dfe723 postgres`rxThreadFunc(arg=) + 
> 2163 at 

[jira] [Commented] (HAWQ-564) QD hangs when connecting to resource manager

2016-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220995#comment-15220995
 ] 

ASF GitHub Bot commented on HAWQ-564:
-

Github user ictmalili commented on the pull request:

https://github.com/apache/incubator-hawq/pull/543#issuecomment-204207832
  
+1


> QD hangs when connecting to resource manager
> 
>
> Key: HAWQ-564
> URL: https://issues.apache.org/jira/browse/HAWQ-564
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Resource Manager
>Affects Versions: 2.0.0
>Reporter: Chunling Wang
>Assignee: Yi Jin
>
> When first inject panic in QE process, we run a query and segment is down. 
> After the segment is up, we run another query and get correct answer. Then we 
> inject the same panic second time. After the segment is down and then up 
> again, we run a query and find QD process hangs when connecting to resource 
> manager. Here is the backtrace when QD hangs:
> {code}
> * thread #1: tid = 0x21d8be, 0x7fff890355be libsystem_kernel.dylib`poll + 
> 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>   * frame #0: 0x7fff890355be libsystem_kernel.dylib`poll + 10
> frame #1: 0x000101daeafe postgres`processAllCommFileDescs + 158 at 
> rmcomm_AsyncComm.c:156
> frame #2: 0x000101db85f5 
> postgres`callSyncRPCRemote(hostname=0x7f9c19e00cd0, port=5437, 
> sendbuff=0x7f9c1b918f50, sendbuffsize=80, sendmsgid=259, 
> exprecvmsgid=2307, recvsmb=, errorbuf=0x00010230c1a0, 
> errorbufsize=) + 645 at rmcomm_SyncComm.c:122
> frame #3: 0x000101db2d85 postgres`acquireResourceFromRM [inlined] 
> callSyncRPCToRM(sendbuff=0x7f9c1b918f50, sendbuffsize=, 
> sendmsgid=259, exprecvmsgid=2307, recvsmb=0x7f9c1b918e70, 
> errorbuf=, errorbufsize=1024) + 73 at rmcomm_QD2RM.c:2780
> frame #4: 0x000101db2d3c 
> postgres`acquireResourceFromRM(index=, sessionid=12, 
> slice_size=462524016, iobytes=134217728, preferred_nodes=0x7f9c1a02d398, 
> preferred_nodes_size=, max_seg_count_fix=, 
> min_seg_count_fix=, errorbuf=, 
> errorbufsize=) + 572 at rmcomm_QD2RM.c:742
> frame #5: 0x000101c979e7 postgres`AllocateResource(life=QRL_ONCE, 
> slice_size=5, iobytes=134217728, max_target_segment_num=1, 
> min_target_segment_num=1, vol_info=0x7f9c1a02d398, vol_info_size=1) + 631 
> at pquery.c:796
> frame #6: 0x000101e8c60f 
> postgres`calculate_planner_segment_num(query=, 
> resourceLife=QRL_ONCE, fullRangeTable=, 
> intoPolicy=, sliceNum=5) + 14287 at cdbdatalocality.c:4207
> frame #7: 0x000101c0f671 postgres`planner + 106 at planner.c:496
> frame #8: 0x000101c0f607 postgres`planner(parse=0x7f9c1a02a140, 
> cursorOptions=, boundParams=0x, 
> resourceLife=QRL_ONCE) + 311 at planner.c:310
> frame #9: 0x000101c8eb33 
> postgres`pg_plan_query(querytree=0x7f9c1a02a140, 
> boundParams=0x, resource_life=QRL_ONCE) + 99 at postgres.c:837
> frame #10: 0x000101c956ae postgres`exec_simple_query + 21 at 
> postgres.c:911
> frame #11: 0x000101c95699 
> postgres`exec_simple_query(query_string=0x7f9c1a028a30, 
> seqServerHost=0x, seqServerPort=-1) + 1577 at postgres.c:1671
> frame #12: 0x000101c91a4c postgres`PostgresMain(argc=, 
> argv=, username=0x7f9c1b808cf0) + 9404 at postgres.c:4754
> frame #13: 0x000101c4ae02 postgres`ServerLoop [inlined] BackendRun + 
> 105 at postmaster.c:5889
> frame #14: 0x000101c4ad99 postgres`ServerLoop at postmaster.c:5484
> frame #15: 0x000101c4ad99 postgres`ServerLoop + 9593 at 
> postmaster.c:2163
> frame #16: 0x000101c47d3b postgres`PostmasterMain(argc=, 
> argv=) + 5019 at postmaster.c:1454
> frame #17: 0x000101bb1aa9 postgres`main(argc=9, 
> argv=0x7f9c19c1eef0) + 1433 at main.c:209
> frame #18: 0x7fff95e8c5c9 libdyld.dylib`start + 1
>   thread #2: tid = 0x21d8bf, 0x7fff890355be libsystem_kernel.dylib`poll + 
> 10
> frame #0: 0x7fff890355be libsystem_kernel.dylib`poll + 10
> frame #1: 0x000101dfe723 postgres`rxThreadFunc(arg=) + 
> 2163 at ic_udp.c:6251
> frame #2: 0x7fff95e822fc libsystem_pthread.dylib`_pthread_body + 131
> frame #3: 0x7fff95e82279 libsystem_pthread.dylib`_pthread_start + 176
> frame #4: 0x7fff95e804b1 libsystem_pthread.dylib`thread_start + 13
>   thread #3: tid = 0x21d9c2, 0x7fff890343f6 
> libsystem_kernel.dylib`__select + 10
> frame #0: 0x7fff890343f6 libsystem_kernel.dylib`__select + 10
> frame #1: 0x000101e9d42e postgres`pg_usleep(microsec=) + 
> 78 at pgsleep.c:43
> frame #2: 0x000101db1a66 
> postgres`generateResourceRefreshHeartBeat(arg=0x7f9c19f02480) + 166 at 
> rmcomm_QD2RM.c:1519
> frame #3: 

[jira] [Commented] (HAWQ-564) QD hangs when connecting to resource manager

2016-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220980#comment-15220980
 ] 

ASF GitHub Bot commented on HAWQ-564:
-

Github user linwen commented on the pull request:

https://github.com/apache/incubator-hawq/pull/543#issuecomment-204204977
  
+1 


> QD hangs when connecting to resource manager
> 
>
> Key: HAWQ-564
> URL: https://issues.apache.org/jira/browse/HAWQ-564
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Resource Manager
>Affects Versions: 2.0.0
>Reporter: Chunling Wang
>Assignee: Yi Jin
>
> When first inject panic in QE process, we run a query and segment is down. 
> After the segment is up, we run another query and get correct answer. Then we 
> inject the same panic second time. After the segment is down and then up 
> again, we run a query and find QD process hangs when connecting to resource 
> manager. Here is the backtrace when QD hangs:
> {code}
> * thread #1: tid = 0x21d8be, 0x7fff890355be libsystem_kernel.dylib`poll + 
> 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>   * frame #0: 0x7fff890355be libsystem_kernel.dylib`poll + 10
> frame #1: 0x000101daeafe postgres`processAllCommFileDescs + 158 at 
> rmcomm_AsyncComm.c:156
> frame #2: 0x000101db85f5 
> postgres`callSyncRPCRemote(hostname=0x7f9c19e00cd0, port=5437, 
> sendbuff=0x7f9c1b918f50, sendbuffsize=80, sendmsgid=259, 
> exprecvmsgid=2307, recvsmb=, errorbuf=0x00010230c1a0, 
> errorbufsize=) + 645 at rmcomm_SyncComm.c:122
> frame #3: 0x000101db2d85 postgres`acquireResourceFromRM [inlined] 
> callSyncRPCToRM(sendbuff=0x7f9c1b918f50, sendbuffsize=, 
> sendmsgid=259, exprecvmsgid=2307, recvsmb=0x7f9c1b918e70, 
> errorbuf=, errorbufsize=1024) + 73 at rmcomm_QD2RM.c:2780
> frame #4: 0x000101db2d3c 
> postgres`acquireResourceFromRM(index=, sessionid=12, 
> slice_size=462524016, iobytes=134217728, preferred_nodes=0x7f9c1a02d398, 
> preferred_nodes_size=, max_seg_count_fix=, 
> min_seg_count_fix=, errorbuf=, 
> errorbufsize=) + 572 at rmcomm_QD2RM.c:742
> frame #5: 0x000101c979e7 postgres`AllocateResource(life=QRL_ONCE, 
> slice_size=5, iobytes=134217728, max_target_segment_num=1, 
> min_target_segment_num=1, vol_info=0x7f9c1a02d398, vol_info_size=1) + 631 
> at pquery.c:796
> frame #6: 0x000101e8c60f 
> postgres`calculate_planner_segment_num(query=, 
> resourceLife=QRL_ONCE, fullRangeTable=, 
> intoPolicy=, sliceNum=5) + 14287 at cdbdatalocality.c:4207
> frame #7: 0x000101c0f671 postgres`planner + 106 at planner.c:496
> frame #8: 0x000101c0f607 postgres`planner(parse=0x7f9c1a02a140, 
> cursorOptions=, boundParams=0x, 
> resourceLife=QRL_ONCE) + 311 at planner.c:310
> frame #9: 0x000101c8eb33 
> postgres`pg_plan_query(querytree=0x7f9c1a02a140, 
> boundParams=0x, resource_life=QRL_ONCE) + 99 at postgres.c:837
> frame #10: 0x000101c956ae postgres`exec_simple_query + 21 at 
> postgres.c:911
> frame #11: 0x000101c95699 
> postgres`exec_simple_query(query_string=0x7f9c1a028a30, 
> seqServerHost=0x, seqServerPort=-1) + 1577 at postgres.c:1671
> frame #12: 0x000101c91a4c postgres`PostgresMain(argc=, 
> argv=, username=0x7f9c1b808cf0) + 9404 at postgres.c:4754
> frame #13: 0x000101c4ae02 postgres`ServerLoop [inlined] BackendRun + 
> 105 at postmaster.c:5889
> frame #14: 0x000101c4ad99 postgres`ServerLoop at postmaster.c:5484
> frame #15: 0x000101c4ad99 postgres`ServerLoop + 9593 at 
> postmaster.c:2163
> frame #16: 0x000101c47d3b postgres`PostmasterMain(argc=, 
> argv=) + 5019 at postmaster.c:1454
> frame #17: 0x000101bb1aa9 postgres`main(argc=9, 
> argv=0x7f9c19c1eef0) + 1433 at main.c:209
> frame #18: 0x7fff95e8c5c9 libdyld.dylib`start + 1
>   thread #2: tid = 0x21d8bf, 0x7fff890355be libsystem_kernel.dylib`poll + 
> 10
> frame #0: 0x7fff890355be libsystem_kernel.dylib`poll + 10
> frame #1: 0x000101dfe723 postgres`rxThreadFunc(arg=) + 
> 2163 at ic_udp.c:6251
> frame #2: 0x7fff95e822fc libsystem_pthread.dylib`_pthread_body + 131
> frame #3: 0x7fff95e82279 libsystem_pthread.dylib`_pthread_start + 176
> frame #4: 0x7fff95e804b1 libsystem_pthread.dylib`thread_start + 13
>   thread #3: tid = 0x21d9c2, 0x7fff890343f6 
> libsystem_kernel.dylib`__select + 10
> frame #0: 0x7fff890343f6 libsystem_kernel.dylib`__select + 10
> frame #1: 0x000101e9d42e postgres`pg_usleep(microsec=) + 
> 78 at pgsleep.c:43
> frame #2: 0x000101db1a66 
> postgres`generateResourceRefreshHeartBeat(arg=0x7f9c19f02480) + 166 at 
> rmcomm_QD2RM.c:1519
> frame #3: 

[jira] [Commented] (HAWQ-564) QD hangs when connecting to resource manager

2016-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220977#comment-15220977
 ] 

ASF GitHub Bot commented on HAWQ-564:
-

Github user yaoj2 commented on the pull request:

https://github.com/apache/incubator-hawq/pull/543#issuecomment-204203929
  
LGTM


> QD hangs when connecting to resource manager
> 
>
> Key: HAWQ-564
> URL: https://issues.apache.org/jira/browse/HAWQ-564
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Resource Manager
>Affects Versions: 2.0.0
>Reporter: Chunling Wang
>Assignee: Yi Jin
>
> When first inject panic in QE process, we run a query and segment is down. 
> After the segment is up, we run another query and get correct answer. Then we 
> inject the same panic second time. After the segment is down and then up 
> again, we run a query and find QD process hangs when connecting to resource 
> manager. Here is the backtrace when QD hangs:
> {code}
> * thread #1: tid = 0x21d8be, 0x7fff890355be libsystem_kernel.dylib`poll + 
> 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>   * frame #0: 0x7fff890355be libsystem_kernel.dylib`poll + 10
> frame #1: 0x000101daeafe postgres`processAllCommFileDescs + 158 at 
> rmcomm_AsyncComm.c:156
> frame #2: 0x000101db85f5 
> postgres`callSyncRPCRemote(hostname=0x7f9c19e00cd0, port=5437, 
> sendbuff=0x7f9c1b918f50, sendbuffsize=80, sendmsgid=259, 
> exprecvmsgid=2307, recvsmb=, errorbuf=0x00010230c1a0, 
> errorbufsize=) + 645 at rmcomm_SyncComm.c:122
> frame #3: 0x000101db2d85 postgres`acquireResourceFromRM [inlined] 
> callSyncRPCToRM(sendbuff=0x7f9c1b918f50, sendbuffsize=, 
> sendmsgid=259, exprecvmsgid=2307, recvsmb=0x7f9c1b918e70, 
> errorbuf=, errorbufsize=1024) + 73 at rmcomm_QD2RM.c:2780
> frame #4: 0x000101db2d3c 
> postgres`acquireResourceFromRM(index=, sessionid=12, 
> slice_size=462524016, iobytes=134217728, preferred_nodes=0x7f9c1a02d398, 
> preferred_nodes_size=, max_seg_count_fix=, 
> min_seg_count_fix=, errorbuf=, 
> errorbufsize=) + 572 at rmcomm_QD2RM.c:742
> frame #5: 0x000101c979e7 postgres`AllocateResource(life=QRL_ONCE, 
> slice_size=5, iobytes=134217728, max_target_segment_num=1, 
> min_target_segment_num=1, vol_info=0x7f9c1a02d398, vol_info_size=1) + 631 
> at pquery.c:796
> frame #6: 0x000101e8c60f 
> postgres`calculate_planner_segment_num(query=, 
> resourceLife=QRL_ONCE, fullRangeTable=, 
> intoPolicy=, sliceNum=5) + 14287 at cdbdatalocality.c:4207
> frame #7: 0x000101c0f671 postgres`planner + 106 at planner.c:496
> frame #8: 0x000101c0f607 postgres`planner(parse=0x7f9c1a02a140, 
> cursorOptions=, boundParams=0x, 
> resourceLife=QRL_ONCE) + 311 at planner.c:310
> frame #9: 0x000101c8eb33 
> postgres`pg_plan_query(querytree=0x7f9c1a02a140, 
> boundParams=0x, resource_life=QRL_ONCE) + 99 at postgres.c:837
> frame #10: 0x000101c956ae postgres`exec_simple_query + 21 at 
> postgres.c:911
> frame #11: 0x000101c95699 
> postgres`exec_simple_query(query_string=0x7f9c1a028a30, 
> seqServerHost=0x, seqServerPort=-1) + 1577 at postgres.c:1671
> frame #12: 0x000101c91a4c postgres`PostgresMain(argc=, 
> argv=, username=0x7f9c1b808cf0) + 9404 at postgres.c:4754
> frame #13: 0x000101c4ae02 postgres`ServerLoop [inlined] BackendRun + 
> 105 at postmaster.c:5889
> frame #14: 0x000101c4ad99 postgres`ServerLoop at postmaster.c:5484
> frame #15: 0x000101c4ad99 postgres`ServerLoop + 9593 at 
> postmaster.c:2163
> frame #16: 0x000101c47d3b postgres`PostmasterMain(argc=, 
> argv=) + 5019 at postmaster.c:1454
> frame #17: 0x000101bb1aa9 postgres`main(argc=9, 
> argv=0x7f9c19c1eef0) + 1433 at main.c:209
> frame #18: 0x7fff95e8c5c9 libdyld.dylib`start + 1
>   thread #2: tid = 0x21d8bf, 0x7fff890355be libsystem_kernel.dylib`poll + 
> 10
> frame #0: 0x7fff890355be libsystem_kernel.dylib`poll + 10
> frame #1: 0x000101dfe723 postgres`rxThreadFunc(arg=) + 
> 2163 at ic_udp.c:6251
> frame #2: 0x7fff95e822fc libsystem_pthread.dylib`_pthread_body + 131
> frame #3: 0x7fff95e82279 libsystem_pthread.dylib`_pthread_start + 176
> frame #4: 0x7fff95e804b1 libsystem_pthread.dylib`thread_start + 13
>   thread #3: tid = 0x21d9c2, 0x7fff890343f6 
> libsystem_kernel.dylib`__select + 10
> frame #0: 0x7fff890343f6 libsystem_kernel.dylib`__select + 10
> frame #1: 0x000101e9d42e postgres`pg_usleep(microsec=) + 
> 78 at pgsleep.c:43
> frame #2: 0x000101db1a66 
> postgres`generateResourceRefreshHeartBeat(arg=0x7f9c19f02480) + 166 at 
> rmcomm_QD2RM.c:1519
> frame #3: 

[jira] [Commented] (HAWQ-564) QD hangs when connecting to resource manager

2016-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220943#comment-15220943
 ] 

ASF GitHub Bot commented on HAWQ-564:
-

GitHub user jiny2 opened a pull request:

https://github.com/apache/incubator-hawq/pull/543

HAWQ-564. QD hangs when connecting to resource manager



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jiny2/incubator-hawq HAWQ0564

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq/pull/543.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #543


commit 449951ad0e233f436388f11ffd06107343ce538a
Author: YI JIN 
Date:   2016-04-01T00:51:44Z

HAWQ-564. QD hangs when connecting to resource manager




> QD hangs when connecting to resource manager
> 
>
> Key: HAWQ-564
> URL: https://issues.apache.org/jira/browse/HAWQ-564
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Resource Manager
>Affects Versions: 2.0.0
>Reporter: Chunling Wang
>Assignee: Yi Jin
>
> When first inject panic in QE process, we run a query and segment is down. 
> After the segment is up, we run another query and get correct answer. Then we 
> inject the same panic second time. After the segment is down and then up 
> again, we run a query and find QD process hangs when connecting to resource 
> manager. Here is the backtrace when QD hangs:
> {code}
> * thread #1: tid = 0x21d8be, 0x7fff890355be libsystem_kernel.dylib`poll + 
> 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>   * frame #0: 0x7fff890355be libsystem_kernel.dylib`poll + 10
> frame #1: 0x000101daeafe postgres`processAllCommFileDescs + 158 at 
> rmcomm_AsyncComm.c:156
> frame #2: 0x000101db85f5 
> postgres`callSyncRPCRemote(hostname=0x7f9c19e00cd0, port=5437, 
> sendbuff=0x7f9c1b918f50, sendbuffsize=80, sendmsgid=259, 
> exprecvmsgid=2307, recvsmb=, errorbuf=0x00010230c1a0, 
> errorbufsize=) + 645 at rmcomm_SyncComm.c:122
> frame #3: 0x000101db2d85 postgres`acquireResourceFromRM [inlined] 
> callSyncRPCToRM(sendbuff=0x7f9c1b918f50, sendbuffsize=, 
> sendmsgid=259, exprecvmsgid=2307, recvsmb=0x7f9c1b918e70, 
> errorbuf=, errorbufsize=1024) + 73 at rmcomm_QD2RM.c:2780
> frame #4: 0x000101db2d3c 
> postgres`acquireResourceFromRM(index=, sessionid=12, 
> slice_size=462524016, iobytes=134217728, preferred_nodes=0x7f9c1a02d398, 
> preferred_nodes_size=, max_seg_count_fix=, 
> min_seg_count_fix=, errorbuf=, 
> errorbufsize=) + 572 at rmcomm_QD2RM.c:742
> frame #5: 0x000101c979e7 postgres`AllocateResource(life=QRL_ONCE, 
> slice_size=5, iobytes=134217728, max_target_segment_num=1, 
> min_target_segment_num=1, vol_info=0x7f9c1a02d398, vol_info_size=1) + 631 
> at pquery.c:796
> frame #6: 0x000101e8c60f 
> postgres`calculate_planner_segment_num(query=, 
> resourceLife=QRL_ONCE, fullRangeTable=, 
> intoPolicy=, sliceNum=5) + 14287 at cdbdatalocality.c:4207
> frame #7: 0x000101c0f671 postgres`planner + 106 at planner.c:496
> frame #8: 0x000101c0f607 postgres`planner(parse=0x7f9c1a02a140, 
> cursorOptions=, boundParams=0x, 
> resourceLife=QRL_ONCE) + 311 at planner.c:310
> frame #9: 0x000101c8eb33 
> postgres`pg_plan_query(querytree=0x7f9c1a02a140, 
> boundParams=0x, resource_life=QRL_ONCE) + 99 at postgres.c:837
> frame #10: 0x000101c956ae postgres`exec_simple_query + 21 at 
> postgres.c:911
> frame #11: 0x000101c95699 
> postgres`exec_simple_query(query_string=0x7f9c1a028a30, 
> seqServerHost=0x, seqServerPort=-1) + 1577 at postgres.c:1671
> frame #12: 0x000101c91a4c postgres`PostgresMain(argc=, 
> argv=, username=0x7f9c1b808cf0) + 9404 at postgres.c:4754
> frame #13: 0x000101c4ae02 postgres`ServerLoop [inlined] BackendRun + 
> 105 at postmaster.c:5889
> frame #14: 0x000101c4ad99 postgres`ServerLoop at postmaster.c:5484
> frame #15: 0x000101c4ad99 postgres`ServerLoop + 9593 at 
> postmaster.c:2163
> frame #16: 0x000101c47d3b postgres`PostmasterMain(argc=, 
> argv=) + 5019 at postmaster.c:1454
> frame #17: 0x000101bb1aa9 postgres`main(argc=9, 
> argv=0x7f9c19c1eef0) + 1433 at main.c:209
> frame #18: 0x7fff95e8c5c9 libdyld.dylib`start + 1
>   thread #2: tid = 0x21d8bf, 0x7fff890355be libsystem_kernel.dylib`poll + 
> 10
> frame #0: 0x7fff890355be libsystem_kernel.dylib`poll + 10
> frame #1: 0x000101dfe723 postgres`rxThreadFunc(arg=) + 
> 2163 at ic_udp.c:6251
> frame #2: 0x7fff95e822fc 

[jira] [Commented] (HAWQ-564) QD hangs when connecting to resource manager

2016-03-22 Thread Chunling Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206095#comment-15206095
 ] 

Chunling Wang commented on HAWQ-564:


And 'kill -6' can cause same result.

> QD hangs when connecting to resource manager
> 
>
> Key: HAWQ-564
> URL: https://issues.apache.org/jira/browse/HAWQ-564
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Resource Manager
>Affects Versions: 2.0.0
>Reporter: Chunling Wang
>Assignee: Lei Chang
>
> When first inject panic in QE process, we run a query and segment is down. 
> After the segment is up, we run another query and get correct answer. Then we 
> inject the same panic second time. After the segment is down and then up 
> again, we run a query and find QD process hangs when connecting to resource 
> manager. Here is the backtrace when QD hangs:
> {code}
> * thread #1: tid = 0x21d8be, 0x7fff890355be libsystem_kernel.dylib`poll + 
> 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>   * frame #0: 0x7fff890355be libsystem_kernel.dylib`poll + 10
> frame #1: 0x000101daeafe postgres`processAllCommFileDescs + 158 at 
> rmcomm_AsyncComm.c:156
> frame #2: 0x000101db85f5 
> postgres`callSyncRPCRemote(hostname=0x7f9c19e00cd0, port=5437, 
> sendbuff=0x7f9c1b918f50, sendbuffsize=80, sendmsgid=259, 
> exprecvmsgid=2307, recvsmb=, errorbuf=0x00010230c1a0, 
> errorbufsize=) + 645 at rmcomm_SyncComm.c:122
> frame #3: 0x000101db2d85 postgres`acquireResourceFromRM [inlined] 
> callSyncRPCToRM(sendbuff=0x7f9c1b918f50, sendbuffsize=, 
> sendmsgid=259, exprecvmsgid=2307, recvsmb=0x7f9c1b918e70, 
> errorbuf=, errorbufsize=1024) + 73 at rmcomm_QD2RM.c:2780
> frame #4: 0x000101db2d3c 
> postgres`acquireResourceFromRM(index=, sessionid=12, 
> slice_size=462524016, iobytes=134217728, preferred_nodes=0x7f9c1a02d398, 
> preferred_nodes_size=, max_seg_count_fix=, 
> min_seg_count_fix=, errorbuf=, 
> errorbufsize=) + 572 at rmcomm_QD2RM.c:742
> frame #5: 0x000101c979e7 postgres`AllocateResource(life=QRL_ONCE, 
> slice_size=5, iobytes=134217728, max_target_segment_num=1, 
> min_target_segment_num=1, vol_info=0x7f9c1a02d398, vol_info_size=1) + 631 
> at pquery.c:796
> frame #6: 0x000101e8c60f 
> postgres`calculate_planner_segment_num(query=, 
> resourceLife=QRL_ONCE, fullRangeTable=, 
> intoPolicy=, sliceNum=5) + 14287 at cdbdatalocality.c:4207
> frame #7: 0x000101c0f671 postgres`planner + 106 at planner.c:496
> frame #8: 0x000101c0f607 postgres`planner(parse=0x7f9c1a02a140, 
> cursorOptions=, boundParams=0x, 
> resourceLife=QRL_ONCE) + 311 at planner.c:310
> frame #9: 0x000101c8eb33 
> postgres`pg_plan_query(querytree=0x7f9c1a02a140, 
> boundParams=0x, resource_life=QRL_ONCE) + 99 at postgres.c:837
> frame #10: 0x000101c956ae postgres`exec_simple_query + 21 at 
> postgres.c:911
> frame #11: 0x000101c95699 
> postgres`exec_simple_query(query_string=0x7f9c1a028a30, 
> seqServerHost=0x, seqServerPort=-1) + 1577 at postgres.c:1671
> frame #12: 0x000101c91a4c postgres`PostgresMain(argc=, 
> argv=, username=0x7f9c1b808cf0) + 9404 at postgres.c:4754
> frame #13: 0x000101c4ae02 postgres`ServerLoop [inlined] BackendRun + 
> 105 at postmaster.c:5889
> frame #14: 0x000101c4ad99 postgres`ServerLoop at postmaster.c:5484
> frame #15: 0x000101c4ad99 postgres`ServerLoop + 9593 at 
> postmaster.c:2163
> frame #16: 0x000101c47d3b postgres`PostmasterMain(argc=, 
> argv=) + 5019 at postmaster.c:1454
> frame #17: 0x000101bb1aa9 postgres`main(argc=9, 
> argv=0x7f9c19c1eef0) + 1433 at main.c:209
> frame #18: 0x7fff95e8c5c9 libdyld.dylib`start + 1
>   thread #2: tid = 0x21d8bf, 0x7fff890355be libsystem_kernel.dylib`poll + 
> 10
> frame #0: 0x7fff890355be libsystem_kernel.dylib`poll + 10
> frame #1: 0x000101dfe723 postgres`rxThreadFunc(arg=) + 
> 2163 at ic_udp.c:6251
> frame #2: 0x7fff95e822fc libsystem_pthread.dylib`_pthread_body + 131
> frame #3: 0x7fff95e82279 libsystem_pthread.dylib`_pthread_start + 176
> frame #4: 0x7fff95e804b1 libsystem_pthread.dylib`thread_start + 13
>   thread #3: tid = 0x21d9c2, 0x7fff890343f6 
> libsystem_kernel.dylib`__select + 10
> frame #0: 0x7fff890343f6 libsystem_kernel.dylib`__select + 10
> frame #1: 0x000101e9d42e postgres`pg_usleep(microsec=) + 
> 78 at pgsleep.c:43
> frame #2: 0x000101db1a66 
> postgres`generateResourceRefreshHeartBeat(arg=0x7f9c19f02480) + 166 at 
> rmcomm_QD2RM.c:1519
> frame #3: 0x7fff95e822fc libsystem_pthread.dylib`_pthread_body + 131
> frame #4: 0x7fff95e82279 

[jira] [Commented] (HAWQ-564) QD hangs when connecting to resource manager

2016-03-21 Thread Chunling Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203864#comment-15203864
 ] 

Chunling Wang commented on HAWQ-564:


There is another way to cause this bug without fault injection.
1. First run query and get some QEs.
{code}
dispatch=# select count(*) from test_dispatch as t1, test_dispatch as t2, 
test_dispatch as t3 where t1.id *2 = t2.id and t1.id < t3.id;
 count
---
  3725
(1 row)
{code}

{code}
$ ps -ef|grep postgres
  501 30190 1   0  2:34下午 ?? 0:00.31 /usr/local/hawq/bin/postgres 
-D /Users/wangchunling/hawq-data-directory/masterdd -i -M master -p 5432 
--silent-mode=true
  501 30191 30190   0  2:34下午 ?? 0:00.01 postgres: port  5432, master 
logger process
  501 30194 30190   0  2:34下午 ?? 0:00.00 postgres: port  5432, stats 
collector process
  501 30195 30190   0  2:34下午 ?? 0:00.01 postgres: port  5432, writer 
process
  501 30196 30190   0  2:34下午 ?? 0:00.00 postgres: port  5432, 
checkpoint process
  501 30197 30190   0  2:34下午 ?? 0:00.00 postgres: port  5432, 
seqserver process
  501 30198 30190   0  2:34下午 ?? 0:00.00 postgres: port  5432, WAL Send 
Server process
  501 30199 30190   0  2:34下午 ?? 0:00.00 postgres: port  5432, DFS 
Metadata Cache process
  501 30200 30190   0  2:34下午 ?? 0:00.07 postgres: port  5432, master 
resource manager
  501 30216 1   0  2:34下午 ?? 0:00.37 /usr/local/hawq/bin/postgres 
-D /Users/wangchunling/hawq-data-directory/segmentdd -i -M segment -p 4 
--silent-mode=true
  501 30217 30216   0  2:34下午 ?? 0:00.02 postgres: port 4, logger 
process
  501 30220 30216   0  2:34下午 ?? 0:00.00 postgres: port 4, stats 
collector process
  501 30221 30216   0  2:34下午 ?? 0:00.01 postgres: port 4, writer 
process
  501 30222 30216   0  2:34下午 ?? 0:00.00 postgres: port 4, 
checkpoint process
  501 30223 30216   0  2:34下午 ?? 0:00.03 postgres: port 4, segment 
resource manager
  501 30231 30190   0  2:35下午 ?? 0:00.03 postgres: port  5432, 
wangchunling dispatch [local] con12 cmd6 idle [local]
  501 30235 30216   0  2:35下午 ?? 0:00.13 postgres: port 4, 
wangchunling dispatch 127.0.0.1(65051) con12 seg0 idle
  501 30239 30216   0  2:35下午 ?? 0:00.06 postgres: port 4, 
wangchunling dispatch 127.0.0.1(65061) con12 seg0 idle
  501 30240 30216   0  2:35下午 ?? 0:00.06 postgres: port 4, 
wangchunling dispatch 127.0.0.1(65063) con12 seg0 idle
  501 30242 99560   0  2:36下午 ttys0000:00.00 grep postgres
{code}

2. Kill some QE and there is no QE.
{code}
$ kill -9 30235
$ ps -ef|grep postgres
  501 30190 1   0  2:34下午 ?? 0:00.32 /usr/local/hawq/bin/postgres 
-D /Users/wangchunling/hawq-data-directory/masterdd -i -M master -p 5432 
--silent-mode=true
  501 30191 30190   0  2:34下午 ?? 0:00.01 postgres: port  5432, master 
logger process
  501 30194 30190   0  2:34下午 ?? 0:00.00 postgres: port  5432, stats 
collector process
  501 30195 30190   0  2:34下午 ?? 0:00.01 postgres: port  5432, writer 
process
  501 30196 30190   0  2:34下午 ?? 0:00.00 postgres: port  5432, 
checkpoint process
  501 30197 30190   0  2:34下午 ?? 0:00.00 postgres: port  5432, 
seqserver process
  501 30198 30190   0  2:34下午 ?? 0:00.00 postgres: port  5432, WAL Send 
Server process
  501 30199 30190   0  2:34下午 ?? 0:00.00 postgres: port  5432, DFS 
Metadata Cache process
  501 30200 30190   0  2:34下午 ?? 0:00.08 postgres: port  5432, master 
resource manager
  501 30216 1   0  2:34下午 ?? 0:00.58 /usr/local/hawq/bin/postgres 
-D /Users/wangchunling/hawq-data-directory/segmentdd -i -M segment -p 4 
--silent-mode=true
  501 30217 30216   0  2:34下午 ?? 0:00.03 postgres: port 4, logger 
process
  501 30231 30190   0  2:35下午 ?? 0:00.04 postgres: port  5432, 
wangchunling dispatch [local] con12 cmd6 idle [local]
  501 30248 30216   0  2:36下午 ?? 0:00.00 postgres: port 4, stats 
collector process
  501 30249 30216   0  2:36下午 ?? 0:00.00 postgres: port 4, writer 
process
  501 30250 30216   0  2:36下午 ?? 0:00.00 postgres: port 4, 
checkpoint process
  501 30251 30216   0  2:36下午 ?? 0:00.00 postgres: port 4, segment 
resource manager
  501 30255 99560   0  2:36下午 ttys0000:00.00 grep postgres
{code}
3. Run query again and get some new QEs.
{code}
dispatch=# select count(*) from test_dispatch as t1, test_dispatch as t2, 
test_dispatch as t3 where t1.id *2 = t2.id and t1.id < t3.id;
 count
---
  3725
(1 row)
{code}

{code}
$ ps -ef|grep postgres
  501 30190 1   0  2:34下午 ?? 0:00.33 /usr/local/hawq/bin/postgres 
-D /Users/wangchunling/hawq-data-directory/masterdd -i -M master -p 5432 
--silent-mode=true
  501 30191 30190   0  2:34下午 ?? 0:00.01 postgres: port