[jira] [Updated] (HAWQ-391) bug when failed to acquire resource from resource manager

Dong Li (JIRA) Tue, 02 Feb 2016 20:49:31 -0800

     [ 
https://issues.apache.org/jira/browse/HAWQ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dong Li updated HAWQ-391:
-------------------------
    Description: 
The cluster is in yarn mode with 3 segments. 
I kill segment test3 and test4, and the master knows they have been down and 
table gp__segment_configuration has marked them as down.

However it errors as:
<code>
2016-01-31 23:41:35.853835 
PST,,,p22876,th-373094208,,,,0,,,seg-10000,,,,,"LOG","00000","3rd party error 
log:
2016-01-31 23:41:35.853782, p22886, th139886711736512, INFO 
LibYarnClient::activeResources, activeResources 
finished",,,,,,,,"SysLoggerMain","syslogger.c",518,
2016-01-31 23:41:35.853861 
PST,,,p22886,th-373094208,,,,0,con4,,seg-10000,,,,,"LOG","00000","YARN mode 
resource broker submitted to activate 2 
containers.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1630,
2016-01-31 23:41:35.853877 
PST,,,p22886,th-373094208,,,,0,con4,,seg-10000,,,,,"LOG","00000","YARN mode 
resource broker allocated and activated container. ID : 605(2048 MB, 1 CORE) at 
test4.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1681,
2016-01-31 23:41:35.853895 
PST,,,p22886,th-373094208,,,,0,con4,,seg-10000,,,,,"LOG","00000","YARN mode 
resource broker allocated and activated container. ID : 606(2048 MB, 1 CORE) at 
test4.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1681,
2016-01-31 23:41:35.891660 
PST,"hawqsuperuser","gptest",p24354,th-373094208,"10.32.35.33","59700",2016-01-31
 23:31:35 PST,4161,con131,cmd2,seg-10000,,,x4161,sx1,"ERROR","XX000","failed to 
acquire resource from resource manager, queued resource request is timed out 
due to no resource (pquery.c:806)",,,,,,"insert into customer select * from 
ctas_customer;",0,,"pquery.c",806,"Stack trace:
1    0x87370a postgres errstart (elog.c:496)
2    0x7b46bf postgres AllocateResource (pquery.c:806)
3    0x9a7e1f postgres calculate_planner_segment_num (cdbdatalocality.c:4155)
4    0x727509 postgres planner (planner.c:501)
5    0x7adb5e postgres pg_plan_query (postgres.c:835)
6    0x7ae0d6 postgres <symbol not found> (postgres.c:907)
7    0x7aff32 postgres PostgresMain (postgres.c:4726)
8    0x763bd3 postgres <symbol not found> (postmaster.c:5890)
9    0x76433d postgres <symbol not found> (postmaster.c:2168)
10   0x76616e postgres PostmasterMain (postmaster.c:6520)
11   0x6c081a postgres main (main.c:226)
12   0x3c1e81ed1d libc.so.6 __libc_start_main (??:0)
13   0x4a2589 postgres <symbol not found> (??:0)
"
<code>
The segment test4 is down, but yarn allocate container at test4. Maybe it is 
the reason why it causes errors.


  was:
The cluster is in yarn mode with 3 segments. I kill segment test3 and test4, 
and the master knows they have been down and table gp__segment_configuration 
has marked them as down.

However it errors as:
<code>
2016-01-31 23:41:35.853835 
PST,,,p22876,th-373094208,,,,0,,,seg-10000,,,,,"LOG","00000","3rd party error 
log:
2016-01-31 23:41:35.853782, p22886, th139886711736512, INFO 
LibYarnClient::activeResources, activeResources 
finished",,,,,,,,"SysLoggerMain","syslogger.c",518,
2016-01-31 23:41:35.853861 
PST,,,p22886,th-373094208,,,,0,con4,,seg-10000,,,,,"LOG","00000","YARN mode 
resource broker submitted to activate 2 
containers.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1630,
2016-01-31 23:41:35.853877 
PST,,,p22886,th-373094208,,,,0,con4,,seg-10000,,,,,"LOG","00000","YARN mode 
resource broker allocated and activated container. ID : 605(2048 MB, 1 CORE) at 
test4.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1681,
2016-01-31 23:41:35.853895 
PST,,,p22886,th-373094208,,,,0,con4,,seg-10000,,,,,"LOG","00000","YARN mode 
resource broker allocated and activated container. ID : 606(2048 MB, 1 CORE) at 
test4.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1681,
2016-01-31 23:41:35.891660 
PST,"hawqsuperuser","gptest",p24354,th-373094208,"10.32.35.33","59700",2016-01-31
 23:31:35 PST,4161,con131,cmd2,seg-10000,,,x4161,sx1,"ERROR","XX000","failed to 
acquire resource from resource manager, queued resource request is timed out 
due to no resource (pquery.c:806)",,,,,,"insert into customer select * from 
ctas_customer;",0,,"pquery.c",806,"Stack trace:
1    0x87370a postgres errstart (elog.c:496)
2    0x7b46bf postgres AllocateResource (pquery.c:806)
3    0x9a7e1f postgres calculate_planner_segment_num (cdbdatalocality.c:4155)
4    0x727509 postgres planner (planner.c:501)
5    0x7adb5e postgres pg_plan_query (postgres.c:835)
6    0x7ae0d6 postgres <symbol not found> (postgres.c:907)
7    0x7aff32 postgres PostgresMain (postgres.c:4726)
8    0x763bd3 postgres <symbol not found> (postmaster.c:5890)
9    0x76433d postgres <symbol not found> (postmaster.c:2168)
10   0x76616e postgres PostmasterMain (postmaster.c:6520)
11   0x6c081a postgres main (main.c:226)
12   0x3c1e81ed1d libc.so.6 __libc_start_main (??:0)
13   0x4a2589 postgres <symbol not found> (??:0)
"
<code>


> bug when failed to acquire resource from resource manager
> ---------------------------------------------------------
>
>                 Key: HAWQ-391
>                 URL: https://issues.apache.org/jira/browse/HAWQ-391
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Resource Manager
>            Reporter: Dong Li
>            Assignee: Lei Chang
>
> The cluster is in yarn mode with 3 segments. 
> I kill segment test3 and test4, and the master knows they have been down and 
> table gp__segment_configuration has marked them as down.
> However it errors as:
> <code>
> 2016-01-31 23:41:35.853835 
> PST,,,p22876,th-373094208,,,,0,,,seg-10000,,,,,"LOG","00000","3rd party error 
> log:
> 2016-01-31 23:41:35.853782, p22886, th139886711736512, INFO 
> LibYarnClient::activeResources, activeResources 
> finished",,,,,,,,"SysLoggerMain","syslogger.c",518,
> 2016-01-31 23:41:35.853861 
> PST,,,p22886,th-373094208,,,,0,con4,,seg-10000,,,,,"LOG","00000","YARN mode 
> resource broker submitted to activate 2 
> containers.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1630,
> 2016-01-31 23:41:35.853877 
> PST,,,p22886,th-373094208,,,,0,con4,,seg-10000,,,,,"LOG","00000","YARN mode 
> resource broker allocated and activated container. ID : 605(2048 MB, 1 CORE) 
> at test4.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1681,
> 2016-01-31 23:41:35.853895 
> PST,,,p22886,th-373094208,,,,0,con4,,seg-10000,,,,,"LOG","00000","YARN mode 
> resource broker allocated and activated container. ID : 606(2048 MB, 1 CORE) 
> at test4.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1681,
> 2016-01-31 23:41:35.891660 
> PST,"hawqsuperuser","gptest",p24354,th-373094208,"10.32.35.33","59700",2016-01-31
>  23:31:35 PST,4161,con131,cmd2,seg-10000,,,x4161,sx1,"ERROR","XX000","failed 
> to acquire resource from resource manager, queued resource request is timed 
> out due to no resource (pquery.c:806)",,,,,,"insert into customer select * 
> from ctas_customer;",0,,"pquery.c",806,"Stack trace:
> 1    0x87370a postgres errstart (elog.c:496)
> 2    0x7b46bf postgres AllocateResource (pquery.c:806)
> 3    0x9a7e1f postgres calculate_planner_segment_num (cdbdatalocality.c:4155)
> 4    0x727509 postgres planner (planner.c:501)
> 5    0x7adb5e postgres pg_plan_query (postgres.c:835)
> 6    0x7ae0d6 postgres <symbol not found> (postgres.c:907)
> 7    0x7aff32 postgres PostgresMain (postgres.c:4726)
> 8    0x763bd3 postgres <symbol not found> (postmaster.c:5890)
> 9    0x76433d postgres <symbol not found> (postmaster.c:2168)
> 10   0x76616e postgres PostmasterMain (postmaster.c:6520)
> 11   0x6c081a postgres main (main.c:226)
> 12   0x3c1e81ed1d libc.so.6 __libc_start_main (??:0)
> 13   0x4a2589 postgres <symbol not found> (??:0)
> "
> <code>
> The segment test4 is down, but yarn allocate container at test4. Maybe it is 
> the reason why it causes errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HAWQ-391) bug when failed to acquire resource from resource manager

Reply via email to