[ https://issues.apache.org/jira/browse/HAWQ-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dong Li closed HAWQ-272. ------------------------ > Segment status will not be down after killing postmaster process of segment > ---------------------------------------------------------------------------- > > Key: HAWQ-272 > URL: https://issues.apache.org/jira/browse/HAWQ-272 > Project: Apache HAWQ > Issue Type: Bug > Components: Fault Tolerance > Reporter: Dong Li > Assignee: Lin Wen > Fix For: 2.0.0-beta-incubating > > > At the cluster, if it has QE, and you kill the postmaster pocess of > segment(pid=59335), it can also work and the state of the segment in > gp_segment_configuration is up. > {code} > ps -ef |grep postgres > 502 59309 1 0 10:07AM ?? 0:05.39 > /Users/intern/work/code/main/hawq-db-devel/bin/postgres -D > /Users/intern/hawq-data-directory/masterdd -i -M master -p 5432 > --silent-mode=true > 502 59310 59309 0 10:07AM ?? 0:00.38 postgres: port 5432, master > logger process > 502 59313 59309 0 10:07AM ?? 0:00.16 postgres: port 5432, stats > collector process > 502 59314 59309 0 10:07AM ?? 0:01.89 postgres: port 5432, writer > process > 502 59315 59309 0 10:07AM ?? 0:00.27 postgres: port 5432, > checkpoint process > 502 59316 59309 0 10:07AM ?? 0:00.09 postgres: port 5432, > seqserver process > 502 59317 59309 0 10:07AM ?? 0:00.29 postgres: port 5432, WAL > Send Server process > 502 59318 59309 0 10:07AM ?? 0:00.01 postgres: port 5432, DFS > Metadata Cache process > 502 59319 59309 0 10:07AM ?? 0:10.02 postgres: port 5432, master > resource manager > 502 59335 1 0 10:07AM ?? 0:12.94 > /Users/intern/work/code/main/hawq-db-devel/bin/postgres -D > /Users/intern/hawq-data-directory/segmentdd -i -M segment -p 40000 > --silent-mode=true > 502 59336 59335 0 10:07AM ?? 0:00.61 postgres: port 40000, logger > process > 502 59403 59309 0 10:07AM ?? 0:02.28 postgres: port 5432, intern > intern [local] con11 cmd63 idle [local] > 502 63451 59335 0 10:25AM ?? 0:00.12 postgres: port 40000, stats > collector process > 502 63452 59335 0 10:25AM ?? 0:01.43 postgres: port 40000, writer > process > 502 63453 59335 0 10:25AM ?? 0:00.20 postgres: port 40000, > checkpoint process > 502 63454 59335 0 10:25AM ?? 0:03.64 postgres: port 40000, > segment resource manager > 502 63966 59335 0 10:27AM ?? 0:04.88 postgres: port 40000, intern > intern 127.0.0.1(56871) con11 seg0 idle > 502 63967 59335 0 10:27AM ?? 0:04.90 postgres: port 40000, intern > intern 127.0.0.1(56873) con11 seg1 idle > 502 63968 59335 0 10:27AM ?? 0:07.12 postgres: port 40000, intern > intern 127.0.0.1(56875) con11 seg2 idle > 502 63969 59335 0 10:27AM ?? 0:07.12 postgres: port 40000, intern > intern 127.0.0.1(56877) con11 seg3 idle > 502 63970 59335 0 10:27AM ?? 0:04.89 postgres: port 40000, intern > intern 127.0.0.1(56879) con11 seg4 idle > 502 63971 59335 0 10:27AM ?? 0:04.86 postgres: port 40000, intern > intern 127.0.0.1(56881) con11 seg5 idle > kill -9 59335 > ps -ef |grep postgres > 502 59309 1 0 10:07AM ?? 0:05.64 > /Users/intern/work/code/main/hawq-db-devel/bin/postgres -D > /Users/intern/hawq-data-directory/masterdd -i -M master -p 5432 > --silent-mode=true > 502 59310 59309 0 10:07AM ?? 0:00.40 postgres: port 5432, master > logger process > 502 59313 59309 0 10:07AM ?? 0:00.17 postgres: port 5432, stats > collector process > 502 59314 59309 0 10:07AM ?? 0:02.01 postgres: port 5432, writer > process > 502 59315 59309 0 10:07AM ?? 0:00.28 postgres: port 5432, > checkpoint process > 502 59316 59309 0 10:07AM ?? 0:00.09 postgres: port 5432, > seqserver process > 502 59317 59309 0 10:07AM ?? 0:00.31 postgres: port 5432, WAL > Send Server process > 502 59318 59309 0 10:07AM ?? 0:00.01 postgres: port 5432, DFS > Metadata Cache process > 502 59319 59309 0 10:07AM ?? 0:10.64 postgres: port 5432, master > resource manager > 502 59336 1 0 10:07AM ?? 0:00.64 postgres: port 40000, logger > process > 502 59403 59309 0 10:07AM ?? 0:02.40 postgres: port 5432, intern > intern [local] con11 cmd67 idle [local] > 502 63454 1 0 10:25AM ?? 0:03.96 postgres: port 40000, > segment resource manager > 502 63966 1 0 10:27AM ?? 0:04.96 postgres: port 40000, intern > intern 127.0.0.1(56871) con11 seg0 idle > 502 63967 1 0 10:27AM ?? 0:04.98 postgres: port 40000, intern > intern 127.0.0.1(56873) con11 seg1 idle > 502 63968 1 0 10:27AM ?? 0:07.20 postgres: port 40000, intern > intern 127.0.0.1(56875) con11 seg2 idle > 502 63969 1 0 10:27AM ?? 0:07.21 postgres: port 40000, intern > intern 127.0.0.1(56877) con11 seg3 idle > 502 63970 1 0 10:27AM ?? 0:04.98 postgres: port 40000, intern > intern 127.0.0.1(56879) con11 seg4 idle > 502 63971 1 0 10:27AM ?? 0:04.94 postgres: port 40000, intern > intern 127.0.0.1(56881) con11 seg5 idle > {code} > Then we execute insert sql. > {code} > intern=# select count(*) from b; > count > ---------- > 41058000 > (1 row) > intern=# insert into b VALUES (1); > INSERT 0 1 > intern=# select count(*) from b; > count > ---------- > 41058001 > (1 row) > intern=# select * from gp_segment_configuration ; > registration_order | role | status | port | hostname | address > --------------------+------+--------+-------+------------+------------ > 0 | m | u | 5432 | doli.local | doli.local > 1 | p | u | 40000 | localhost | 127.0.0.1 > (2 rows) > {code} > If your QE is enough to execute the query, it will success. Otherwise it will > call postmaster to create QE, and it will find postmaster is not alive and > mark it as down. > The problem is that we should check the postmaster process of the segment > live state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)