Dong Li created HAWQ-272:
----------------------------

             Summary: Segment status will not be down after killing postmaster 
process of segment 
                 Key: HAWQ-272
                 URL: https://issues.apache.org/jira/browse/HAWQ-272
             Project: Apache HAWQ
          Issue Type: Bug
          Components: Fault Tolerance
            Reporter: Dong Li
            Assignee: Lei Chang


At the cluster, if it has QE, and you kill the postmaster pocess of 
segment(pid=59335), it can also work and the state of the segment in 
gp_segment_configuration is up.
{code}
ps -ef |grep postgres
  502 59309     1   0 10:07AM ??         0:05.39 
/Users/intern/work/code/main/hawq-db-devel/bin/postgres -D 
/Users/intern/hawq-data-directory/masterdd -i -M master -p 5432 
--silent-mode=true
  502 59310 59309   0 10:07AM ??         0:00.38 postgres: port  5432, master 
logger process
  502 59313 59309   0 10:07AM ??         0:00.16 postgres: port  5432, stats 
collector process
  502 59314 59309   0 10:07AM ??         0:01.89 postgres: port  5432, writer 
process
  502 59315 59309   0 10:07AM ??         0:00.27 postgres: port  5432, 
checkpoint process
  502 59316 59309   0 10:07AM ??         0:00.09 postgres: port  5432, 
seqserver process
  502 59317 59309   0 10:07AM ??         0:00.29 postgres: port  5432, WAL Send 
Server process
  502 59318 59309   0 10:07AM ??         0:00.01 postgres: port  5432, DFS 
Metadata Cache process
  502 59319 59309   0 10:07AM ??         0:10.02 postgres: port  5432, master 
resource manager
  502 59335     1   0 10:07AM ??         0:12.94 
/Users/intern/work/code/main/hawq-db-devel/bin/postgres -D 
/Users/intern/hawq-data-directory/segmentdd -i -M segment -p 40000 
--silent-mode=true
  502 59336 59335   0 10:07AM ??         0:00.61 postgres: port 40000, logger 
process
  502 59403 59309   0 10:07AM ??         0:02.28 postgres: port  5432, intern 
intern [local] con11 cmd63 idle [local]
  502 63451 59335   0 10:25AM ??         0:00.12 postgres: port 40000, stats 
collector process
  502 63452 59335   0 10:25AM ??         0:01.43 postgres: port 40000, writer 
process
  502 63453 59335   0 10:25AM ??         0:00.20 postgres: port 40000, 
checkpoint process
  502 63454 59335   0 10:25AM ??         0:03.64 postgres: port 40000, segment 
resource manager
  502 63966 59335   0 10:27AM ??         0:04.88 postgres: port 40000, intern 
intern 127.0.0.1(56871) con11 seg0 idle
  502 63967 59335   0 10:27AM ??         0:04.90 postgres: port 40000, intern 
intern 127.0.0.1(56873) con11 seg1 idle
  502 63968 59335   0 10:27AM ??         0:07.12 postgres: port 40000, intern 
intern 127.0.0.1(56875) con11 seg2 idle
  502 63969 59335   0 10:27AM ??         0:07.12 postgres: port 40000, intern 
intern 127.0.0.1(56877) con11 seg3 idle
  502 63970 59335   0 10:27AM ??         0:04.89 postgres: port 40000, intern 
intern 127.0.0.1(56879) con11 seg4 idle
  502 63971 59335   0 10:27AM ??         0:04.86 postgres: port 40000, intern 
intern 127.0.0.1(56881) con11 seg5 idle

kill -9 59335

ps -ef |grep postgres
  502 59309     1   0 10:07AM ??         0:05.64 
/Users/intern/work/code/main/hawq-db-devel/bin/postgres -D 
/Users/intern/hawq-data-directory/masterdd -i -M master -p 5432 
--silent-mode=true
  502 59310 59309   0 10:07AM ??         0:00.40 postgres: port  5432, master 
logger process
  502 59313 59309   0 10:07AM ??         0:00.17 postgres: port  5432, stats 
collector process
  502 59314 59309   0 10:07AM ??         0:02.01 postgres: port  5432, writer 
process
  502 59315 59309   0 10:07AM ??         0:00.28 postgres: port  5432, 
checkpoint process
  502 59316 59309   0 10:07AM ??         0:00.09 postgres: port  5432, 
seqserver process
  502 59317 59309   0 10:07AM ??         0:00.31 postgres: port  5432, WAL Send 
Server process
  502 59318 59309   0 10:07AM ??         0:00.01 postgres: port  5432, DFS 
Metadata Cache process
  502 59319 59309   0 10:07AM ??         0:10.64 postgres: port  5432, master 
resource manager
  502 59336     1   0 10:07AM ??         0:00.64 postgres: port 40000, logger 
process
  502 59403 59309   0 10:07AM ??         0:02.40 postgres: port  5432, intern 
intern [local] con11 cmd67 idle [local]
  502 63454     1   0 10:25AM ??         0:03.96 postgres: port 40000, segment 
resource manager
  502 63966     1   0 10:27AM ??         0:04.96 postgres: port 40000, intern 
intern 127.0.0.1(56871) con11 seg0 idle
  502 63967     1   0 10:27AM ??         0:04.98 postgres: port 40000, intern 
intern 127.0.0.1(56873) con11 seg1 idle
  502 63968     1   0 10:27AM ??         0:07.20 postgres: port 40000, intern 
intern 127.0.0.1(56875) con11 seg2 idle
  502 63969     1   0 10:27AM ??         0:07.21 postgres: port 40000, intern 
intern 127.0.0.1(56877) con11 seg3 idle
  502 63970     1   0 10:27AM ??         0:04.98 postgres: port 40000, intern 
intern 127.0.0.1(56879) con11 seg4 idle
  502 63971     1   0 10:27AM ??         0:04.94 postgres: port 40000, intern 
intern 127.0.0.1(56881) con11 seg5 idle
{code}
Then we execute insert sql.
{code}
intern=# select count(*) from b;
  count
----------
 41058000
(1 row)

intern=# insert into b VALUES (1);
INSERT 0 1
intern=# select count(*) from b;
  count
----------
 41058001
(1 row)
intern=# select * from gp_segment_configuration ;
 registration_order | role | status | port  |  hostname  |  address
--------------------+------+--------+-------+------------+------------
                  0 | m    | u      |  5432 | doli.local | doli.local
                  1 | p    | u      | 40000 | localhost  | 127.0.0.1
(2 rows)
{code}

If your QE is enough to execute the query, it will success. Otherwise it will 
call postmaster to create QE, and it will find postmaster is not alive and mark 
it as down.
The problem is that we should check the postmaster process of the segment live 
state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to