[ 
https://issues.apache.org/jira/browse/HAWQ-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Li closed HAWQ-272.
------------------------

> Segment status will not be down after killing postmaster process of segment 
> ----------------------------------------------------------------------------
>
>                 Key: HAWQ-272
>                 URL: https://issues.apache.org/jira/browse/HAWQ-272
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Fault Tolerance
>            Reporter: Dong Li
>            Assignee: Lin Wen
>             Fix For: 2.0.0-beta-incubating
>
>
> At the cluster, if it has QE, and you kill the postmaster pocess of 
> segment(pid=59335), it can also work and the state of the segment in 
> gp_segment_configuration is up.
> {code}
> ps -ef |grep postgres
>   502 59309     1   0 10:07AM ??         0:05.39 
> /Users/intern/work/code/main/hawq-db-devel/bin/postgres -D 
> /Users/intern/hawq-data-directory/masterdd -i -M master -p 5432 
> --silent-mode=true
>   502 59310 59309   0 10:07AM ??         0:00.38 postgres: port  5432, master 
> logger process
>   502 59313 59309   0 10:07AM ??         0:00.16 postgres: port  5432, stats 
> collector process
>   502 59314 59309   0 10:07AM ??         0:01.89 postgres: port  5432, writer 
> process
>   502 59315 59309   0 10:07AM ??         0:00.27 postgres: port  5432, 
> checkpoint process
>   502 59316 59309   0 10:07AM ??         0:00.09 postgres: port  5432, 
> seqserver process
>   502 59317 59309   0 10:07AM ??         0:00.29 postgres: port  5432, WAL 
> Send Server process
>   502 59318 59309   0 10:07AM ??         0:00.01 postgres: port  5432, DFS 
> Metadata Cache process
>   502 59319 59309   0 10:07AM ??         0:10.02 postgres: port  5432, master 
> resource manager
>   502 59335     1   0 10:07AM ??         0:12.94 
> /Users/intern/work/code/main/hawq-db-devel/bin/postgres -D 
> /Users/intern/hawq-data-directory/segmentdd -i -M segment -p 40000 
> --silent-mode=true
>   502 59336 59335   0 10:07AM ??         0:00.61 postgres: port 40000, logger 
> process
>   502 59403 59309   0 10:07AM ??         0:02.28 postgres: port  5432, intern 
> intern [local] con11 cmd63 idle [local]
>   502 63451 59335   0 10:25AM ??         0:00.12 postgres: port 40000, stats 
> collector process
>   502 63452 59335   0 10:25AM ??         0:01.43 postgres: port 40000, writer 
> process
>   502 63453 59335   0 10:25AM ??         0:00.20 postgres: port 40000, 
> checkpoint process
>   502 63454 59335   0 10:25AM ??         0:03.64 postgres: port 40000, 
> segment resource manager
>   502 63966 59335   0 10:27AM ??         0:04.88 postgres: port 40000, intern 
> intern 127.0.0.1(56871) con11 seg0 idle
>   502 63967 59335   0 10:27AM ??         0:04.90 postgres: port 40000, intern 
> intern 127.0.0.1(56873) con11 seg1 idle
>   502 63968 59335   0 10:27AM ??         0:07.12 postgres: port 40000, intern 
> intern 127.0.0.1(56875) con11 seg2 idle
>   502 63969 59335   0 10:27AM ??         0:07.12 postgres: port 40000, intern 
> intern 127.0.0.1(56877) con11 seg3 idle
>   502 63970 59335   0 10:27AM ??         0:04.89 postgres: port 40000, intern 
> intern 127.0.0.1(56879) con11 seg4 idle
>   502 63971 59335   0 10:27AM ??         0:04.86 postgres: port 40000, intern 
> intern 127.0.0.1(56881) con11 seg5 idle
> kill -9 59335
> ps -ef |grep postgres
>   502 59309     1   0 10:07AM ??         0:05.64 
> /Users/intern/work/code/main/hawq-db-devel/bin/postgres -D 
> /Users/intern/hawq-data-directory/masterdd -i -M master -p 5432 
> --silent-mode=true
>   502 59310 59309   0 10:07AM ??         0:00.40 postgres: port  5432, master 
> logger process
>   502 59313 59309   0 10:07AM ??         0:00.17 postgres: port  5432, stats 
> collector process
>   502 59314 59309   0 10:07AM ??         0:02.01 postgres: port  5432, writer 
> process
>   502 59315 59309   0 10:07AM ??         0:00.28 postgres: port  5432, 
> checkpoint process
>   502 59316 59309   0 10:07AM ??         0:00.09 postgres: port  5432, 
> seqserver process
>   502 59317 59309   0 10:07AM ??         0:00.31 postgres: port  5432, WAL 
> Send Server process
>   502 59318 59309   0 10:07AM ??         0:00.01 postgres: port  5432, DFS 
> Metadata Cache process
>   502 59319 59309   0 10:07AM ??         0:10.64 postgres: port  5432, master 
> resource manager
>   502 59336     1   0 10:07AM ??         0:00.64 postgres: port 40000, logger 
> process
>   502 59403 59309   0 10:07AM ??         0:02.40 postgres: port  5432, intern 
> intern [local] con11 cmd67 idle [local]
>   502 63454     1   0 10:25AM ??         0:03.96 postgres: port 40000, 
> segment resource manager
>   502 63966     1   0 10:27AM ??         0:04.96 postgres: port 40000, intern 
> intern 127.0.0.1(56871) con11 seg0 idle
>   502 63967     1   0 10:27AM ??         0:04.98 postgres: port 40000, intern 
> intern 127.0.0.1(56873) con11 seg1 idle
>   502 63968     1   0 10:27AM ??         0:07.20 postgres: port 40000, intern 
> intern 127.0.0.1(56875) con11 seg2 idle
>   502 63969     1   0 10:27AM ??         0:07.21 postgres: port 40000, intern 
> intern 127.0.0.1(56877) con11 seg3 idle
>   502 63970     1   0 10:27AM ??         0:04.98 postgres: port 40000, intern 
> intern 127.0.0.1(56879) con11 seg4 idle
>   502 63971     1   0 10:27AM ??         0:04.94 postgres: port 40000, intern 
> intern 127.0.0.1(56881) con11 seg5 idle
> {code}
> Then we execute insert sql.
> {code}
> intern=# select count(*) from b;
>   count
> ----------
>  41058000
> (1 row)
> intern=# insert into b VALUES (1);
> INSERT 0 1
> intern=# select count(*) from b;
>   count
> ----------
>  41058001
> (1 row)
> intern=# select * from gp_segment_configuration ;
>  registration_order | role | status | port  |  hostname  |  address
> --------------------+------+--------+-------+------------+------------
>                   0 | m    | u      |  5432 | doli.local | doli.local
>                   1 | p    | u      | 40000 | localhost  | 127.0.0.1
> (2 rows)
> {code}
> If your QE is enough to execute the query, it will success. Otherwise it will 
> call postmaster to create QE, and it will find postmaster is not alive and 
> mark it as down.
> The problem is that we should check the postmaster process of the segment 
> live state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to