[ https://issues.apache.org/jira/browse/HAWQ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dong Li updated HAWQ-391: ------------------------- Description: The cluster is in yarn mode with 3 segments. I kill segment test3 and test4, and the master knows they have been down and table gp__segment_configuration has marked them as down. However it errors as: <code> 2016-01-31 23:41:35.853835 PST,,,p22876,th-373094208,,,,0,,,seg-10000,,,,,"LOG","00000","3rd party error log: 2016-01-31 23:41:35.853782, p22886, th139886711736512, INFO LibYarnClient::activeResources, activeResources finished",,,,,,,,"SysLoggerMain","syslogger.c",518, 2016-01-31 23:41:35.853861 PST,,,p22886,th-373094208,,,,0,con4,,seg-10000,,,,,"LOG","00000","YARN mode resource broker submitted to activate 2 containers.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1630, 2016-01-31 23:41:35.853877 PST,,,p22886,th-373094208,,,,0,con4,,seg-10000,,,,,"LOG","00000","YARN mode resource broker allocated and activated container. ID : 605(2048 MB, 1 CORE) at test4.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1681, 2016-01-31 23:41:35.853895 PST,,,p22886,th-373094208,,,,0,con4,,seg-10000,,,,,"LOG","00000","YARN mode resource broker allocated and activated container. ID : 606(2048 MB, 1 CORE) at test4.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1681, 2016-01-31 23:41:35.891660 PST,"hawqsuperuser","gptest",p24354,th-373094208,"10.32.35.33","59700",2016-01-31 23:31:35 PST,4161,con131,cmd2,seg-10000,,,x4161,sx1,"ERROR","XX000","failed to acquire resource from resource manager, queued resource request is timed out due to no resource (pquery.c:806)",,,,,,"insert into customer select * from ctas_customer;",0,,"pquery.c",806,"Stack trace: 1 0x87370a postgres errstart (elog.c:496) 2 0x7b46bf postgres AllocateResource (pquery.c:806) 3 0x9a7e1f postgres calculate_planner_segment_num (cdbdatalocality.c:4155) 4 0x727509 postgres planner (planner.c:501) 5 0x7adb5e postgres pg_plan_query (postgres.c:835) 6 0x7ae0d6 postgres <symbol not found> (postgres.c:907) 7 0x7aff32 postgres PostgresMain (postgres.c:4726) 8 0x763bd3 postgres <symbol not found> (postmaster.c:5890) 9 0x76433d postgres <symbol not found> (postmaster.c:2168) 10 0x76616e postgres PostmasterMain (postmaster.c:6520) 11 0x6c081a postgres main (main.c:226) 12 0x3c1e81ed1d libc.so.6 __libc_start_main (??:0) 13 0x4a2589 postgres <symbol not found> (??:0) " <code> The segment test4 is down, but yarn allocate container at test4. Maybe it is the reason why it causes errors. was: The cluster is in yarn mode with 3 segments. I kill segment test3 and test4, and the master knows they have been down and table gp__segment_configuration has marked them as down. However it errors as: <code> 2016-01-31 23:41:35.853835 PST,,,p22876,th-373094208,,,,0,,,seg-10000,,,,,"LOG","00000","3rd party error log: 2016-01-31 23:41:35.853782, p22886, th139886711736512, INFO LibYarnClient::activeResources, activeResources finished",,,,,,,,"SysLoggerMain","syslogger.c",518, 2016-01-31 23:41:35.853861 PST,,,p22886,th-373094208,,,,0,con4,,seg-10000,,,,,"LOG","00000","YARN mode resource broker submitted to activate 2 containers.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1630, 2016-01-31 23:41:35.853877 PST,,,p22886,th-373094208,,,,0,con4,,seg-10000,,,,,"LOG","00000","YARN mode resource broker allocated and activated container. ID : 605(2048 MB, 1 CORE) at test4.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1681, 2016-01-31 23:41:35.853895 PST,,,p22886,th-373094208,,,,0,con4,,seg-10000,,,,,"LOG","00000","YARN mode resource broker allocated and activated container. ID : 606(2048 MB, 1 CORE) at test4.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1681, 2016-01-31 23:41:35.891660 PST,"hawqsuperuser","gptest",p24354,th-373094208,"10.32.35.33","59700",2016-01-31 23:31:35 PST,4161,con131,cmd2,seg-10000,,,x4161,sx1,"ERROR","XX000","failed to acquire resource from resource manager, queued resource request is timed out due to no resource (pquery.c:806)",,,,,,"insert into customer select * from ctas_customer;",0,,"pquery.c",806,"Stack trace: 1 0x87370a postgres errstart (elog.c:496) 2 0x7b46bf postgres AllocateResource (pquery.c:806) 3 0x9a7e1f postgres calculate_planner_segment_num (cdbdatalocality.c:4155) 4 0x727509 postgres planner (planner.c:501) 5 0x7adb5e postgres pg_plan_query (postgres.c:835) 6 0x7ae0d6 postgres <symbol not found> (postgres.c:907) 7 0x7aff32 postgres PostgresMain (postgres.c:4726) 8 0x763bd3 postgres <symbol not found> (postmaster.c:5890) 9 0x76433d postgres <symbol not found> (postmaster.c:2168) 10 0x76616e postgres PostmasterMain (postmaster.c:6520) 11 0x6c081a postgres main (main.c:226) 12 0x3c1e81ed1d libc.so.6 __libc_start_main (??:0) 13 0x4a2589 postgres <symbol not found> (??:0) " <code> > bug when failed to acquire resource from resource manager > --------------------------------------------------------- > > Key: HAWQ-391 > URL: https://issues.apache.org/jira/browse/HAWQ-391 > Project: Apache HAWQ > Issue Type: Bug > Components: Resource Manager > Reporter: Dong Li > Assignee: Lei Chang > > The cluster is in yarn mode with 3 segments. > I kill segment test3 and test4, and the master knows they have been down and > table gp__segment_configuration has marked them as down. > However it errors as: > <code> > 2016-01-31 23:41:35.853835 > PST,,,p22876,th-373094208,,,,0,,,seg-10000,,,,,"LOG","00000","3rd party error > log: > 2016-01-31 23:41:35.853782, p22886, th139886711736512, INFO > LibYarnClient::activeResources, activeResources > finished",,,,,,,,"SysLoggerMain","syslogger.c",518, > 2016-01-31 23:41:35.853861 > PST,,,p22886,th-373094208,,,,0,con4,,seg-10000,,,,,"LOG","00000","YARN mode > resource broker submitted to activate 2 > containers.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1630, > 2016-01-31 23:41:35.853877 > PST,,,p22886,th-373094208,,,,0,con4,,seg-10000,,,,,"LOG","00000","YARN mode > resource broker allocated and activated container. ID : 605(2048 MB, 1 CORE) > at test4.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1681, > 2016-01-31 23:41:35.853895 > PST,,,p22886,th-373094208,,,,0,con4,,seg-10000,,,,,"LOG","00000","YARN mode > resource broker allocated and activated container. ID : 606(2048 MB, 1 CORE) > at test4.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1681, > 2016-01-31 23:41:35.891660 > PST,"hawqsuperuser","gptest",p24354,th-373094208,"10.32.35.33","59700",2016-01-31 > 23:31:35 PST,4161,con131,cmd2,seg-10000,,,x4161,sx1,"ERROR","XX000","failed > to acquire resource from resource manager, queued resource request is timed > out due to no resource (pquery.c:806)",,,,,,"insert into customer select * > from ctas_customer;",0,,"pquery.c",806,"Stack trace: > 1 0x87370a postgres errstart (elog.c:496) > 2 0x7b46bf postgres AllocateResource (pquery.c:806) > 3 0x9a7e1f postgres calculate_planner_segment_num (cdbdatalocality.c:4155) > 4 0x727509 postgres planner (planner.c:501) > 5 0x7adb5e postgres pg_plan_query (postgres.c:835) > 6 0x7ae0d6 postgres <symbol not found> (postgres.c:907) > 7 0x7aff32 postgres PostgresMain (postgres.c:4726) > 8 0x763bd3 postgres <symbol not found> (postmaster.c:5890) > 9 0x76433d postgres <symbol not found> (postmaster.c:2168) > 10 0x76616e postgres PostmasterMain (postmaster.c:6520) > 11 0x6c081a postgres main (main.c:226) > 12 0x3c1e81ed1d libc.so.6 __libc_start_main (??:0) > 13 0x4a2589 postgres <symbol not found> (??:0) > " > <code> > The segment test4 is down, but yarn allocate container at test4. Maybe it is > the reason why it causes errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)