[ https://issues.apache.org/jira/browse/HAWQ-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chunling Wang closed HAWQ-812. ------------------------------ > Activate standby master failed after create a new database > ---------------------------------------------------------- > > Key: HAWQ-812 > URL: https://issues.apache.org/jira/browse/HAWQ-812 > Project: Apache HAWQ > Issue Type: Bug > Reporter: Chunling Wang > Assignee: Ming LI > Fix For: 2.0.0.0-incubating > > > Activate standby master failed after create a new database. However, it will > success if we do not create a new database even we create a new table and > insert data. > 1. Create a new database 'gptest' > {code} > [gpadmin@test1 ~]$ psql -l > List of databases > Name | Owner | Encoding | Access privileges > -----------+---------+----------+------------------- > postgres | gpadmin | UTF8 | > template0 | gpadmin | UTF8 | > template1 | gpadmin | UTF8 | > (3 rows) > [gpadmin@test1 ~]$ createdb gptest > [gpadmin@test1 ~]$ psql -l > List of databases > Name | Owner | Encoding | Access privileges > -----------+---------+----------+------------------- > gptest | gpadmin | UTF8 | > postgres | gpadmin | UTF8 | > template0 | gpadmin | UTF8 | > template1 | gpadmin | UTF8 | > (4 rows) > {code} > 2. Stop HAWQ master > {code} > [gpadmin@test1 ~]$ hawq stop master -a > 20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-Prepare to do 'hawq > stop' > 20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-You can find log in: > 20160613:20:13:44:068559 > hawq_stop:test1:gpadmin-[INFO]:-/home/gpadmin/hawqAdminLogs/hawq_stop_20160613.log > 20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-GPHOME is set to: > 20160613:20:13:44:068559 > hawq_stop:test1:gpadmin-[INFO]:-/data/pulse-agent-data/HAWQ-main-FeatureTest-opt-mutilnodeparallel-wcl/product/hawq/. > 20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-Stop hawq with args: > ['stop', 'master'] > 20160613:20:13:45:068559 hawq_stop:test1:gpadmin-[INFO]:-There are 0 > connections to the database > 20160613:20:13:45:068559 hawq_stop:test1:gpadmin-[INFO]:-Commencing Master > instance shutdown with mode='smart' > 20160613:20:13:45:068559 hawq_stop:test1:gpadmin-[INFO]:-Master host=test1 > 20160613:20:13:45:068559 hawq_stop:test1:gpadmin-[INFO]:-Stop hawq master > 20160613:20:13:46:068559 hawq_stop:test1:gpadmin-[INFO]:-Master stopped > successfully > {code} > 3. Activate standby master > {code} > [gpadmin@test1 ~]$ ssh test5 'source > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-mutilnodeparallel-wcl/product/hawq/./greenplum_path.sh; > hawq activate standby -a' > 20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-Prepare to do > 'hawq activate' > 20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-You can find log > in: > 20160613:20:14:14:126841 > hawq_activate:test5:gpadmin-[INFO]:-/home/gpadmin/hawqAdminLogs/hawq_activate_20160613.log > 20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-GPHOME is set to: > 20160613:20:14:14:126841 > hawq_activate:test5:gpadmin-[INFO]:-/data/pulse-agent-data/HAWQ-main-FeatureTest-opt-mutilnodeparallel-wcl/product/hawq/. > 20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-Activate hawq > with args: ['activate', 'standby'] > 20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-Starting to > activate standby master 'test5' > 20160613:20:14:15:126841 hawq_activate:test5:gpadmin-[INFO]:-HAWQ master is > not running, skip > 20160613:20:14:15:126841 hawq_activate:test5:gpadmin-[INFO]:-Stopping all the > running segments > 20160613:20:14:21:126841 hawq_activate:test5:gpadmin-[INFO]:- > 20160613:20:14:21:126841 hawq_activate:test5:gpadmin-[INFO]:-Stopping running > standby > 20160613:20:14:23:126841 hawq_activate:test5:gpadmin-[INFO]:-Update master > host name in hawq-site.xml > 20160613:20:14:31:126841 hawq_activate:test5:gpadmin-[INFO]:-GUC > hawq_master_address_host already exist in hawq-site.xml > Update it with value: test5 > 20160613:20:14:31:126841 hawq_activate:test5:gpadmin-[INFO]:-Remove current > standby from hawq-site.xml > 20160613:20:14:39:126841 hawq_activate:test5:gpadmin-[INFO]:-Start master in > master only mode > {code} > It hangs and can not start master. And the master log is following: > {code} > 2016-06-13 20:14:40.268022 > PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","database > system was shut down at 2016-06-13 20:02:50 PDT",,,,,,,0,,"xlog.c",6205, > 2016-06-13 20:14:40.268112 > PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","found > recovery.conf file indicating standby takeover recovery > needed",,,,,,,0,,"xlog.c",5485, > 2016-06-13 20:14:40.268131 > PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","checkpoint > record is at 0/1C75EF0",,,,,,,0,,"xlog.c",6304, > 2016-06-13 20:14:40.268143 > PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","redo record > is at 0/1C75EF0; undo record is at 0/0; shutdown TRUE",,,,,,,0,,"xlog.c",6338, > 2016-06-13 20:14:40.268155 > PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","next > transaction ID: 0/1003; next OID: 16508",,,,,,,0,,"xlog.c",6342, > 2016-06-13 20:14:40.268165 > PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","next > MultiXactId: 1; next MultiXactOffset: 0",,,,,,,0,,"xlog.c",6345, > 2016-06-13 20:14:40.268176 > PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","Forcing Crash > Recovery for Master Standby takeover",,,,,,,0,,"xlog.c",6389, > 2016-06-13 20:14:40.268195 > PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","standby > takeover recovery in progress",,,,,,,0,,"xlog.c",6427, > 2016-06-13 20:14:40.268891 > PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","redo starts > at 0/1C75F40",,,,,,,0,,"xlog.c",6523, > 2016-06-13 20:14:40.273313 > PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","record with > zero length at 0/2639190",,,,,,,0,,"xlog.c",4110, > 2016-06-13 20:14:40.273338 > PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","redo done at > 0/2639140",,,,,,,0,,"xlog.c",6560, > 2016-06-13 20:14:40.273352 > PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","end of > transaction log location is 0/2639190",,,,,,,0,,"xlog.c",6582, > 2016-06-13 20:14:40.273460 > PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","standby > takeover recovery complete",,,,,,,0,,"xlog.c",5506, > 2016-06-13 20:14:40.274904 > PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","Need to > Repair global sequence number 600 so use scanned maximum value 749 > ('gp_persistent_relfile_node')",,,,,,,0,,"cdbpersistentstore.c",519, > 2016-06-13 20:14:40.275093 > PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","Finished > startup pass 1. Proceeding to startup crash recovery passes 2 and > 3.",,,,,,,0,,"xlog.c",6816, > 2016-06-13 20:14:40.284820 > PDT,,,p127519,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","Finished > startup crash recovery pass 2",,,,,,,0,,"xlog.c",6987, > 2016-06-13 20:14:40.289053 > PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery > restart point at 0/1C75F40",,,,,"xlog redo checkpoint: redo 0/1C75F40; undo > 0/0; tli 1; xid 0/1003; oid 16508; multi 1; offset 0; shutdown > REDO PASS 3 @ 0/1C75F40; LSN 0/1C75F90: prev 0/1C75EF0; xid 0: XLOG - > checkpoint: redo 0/1C75F40; undo 0/0; tli 1; xid 0/1003; oid 16508; multi 1; > offset 0; shutdown",,0,,"xlog.c",8323, > 2016-06-13 20:14:40.291597 > PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery > restart point at 0/1C763A0",,,,,"xlog redo checkpoint: redo 0/1C763A0; undo > 0/0; tli 1; xid 0/1021; oid 16508; multi 1; offset 0; shutdown > REDO PASS 3 @ 0/1C763A0; LSN 0/1C763F0: prev 0/1C76370; xid 0: XLOG - > checkpoint: redo 0/1C763A0; undo 0/0; tli 1; xid 0/1021; oid 16508; multi 1; > offset 0; shutdown",,0,,"xlog.c",8323, > 2016-06-13 20:14:40.292625 > PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery > restart point at 0/1C763F0",,,,,"xlog redo checkpoint: redo 0/1C763F0; undo > 0/0; tli 1; xid 0/1021; oid 16508; multi 1; offset 0; shutdown > REDO PASS 3 @ 0/1C763F0; LSN 0/1C76440: prev 0/1C763A0; xid 0: XLOG - > checkpoint: redo 0/1C763F0; undo 0/0; tli 1; xid 0/1021; oid 16508; multi 1; > offset 0; shutdown",,0,,"xlog.c",8323, > 2016-06-13 20:14:40.295223 > PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery > restart point at 0/1C76D90",,,,,"xlog redo checkpoint: redo 0/1C76D90; undo > 0/0; tli 1; xid 0/1046; oid 16508; multi 1; offset 0; online > REDO PASS 3 @ 0/1C76D90; LSN 0/1C76DE0: prev 0/1C76D60; xid 0: XLOG - > checkpoint: redo 0/1C76D90; undo 0/0; tli 1; xid 0/1046; oid 16508; multi 1; > offset 0; online",,0,,"xlog.c",8323, > 2016-06-13 20:14:40.295618 > PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery > restart point at 0/1C76DE0",,,,,"xlog redo checkpoint: redo 0/1C76DE0; undo > 0/0; tli 1; xid 0/1047; oid 16508; multi 1; offset 0; online > REDO PASS 3 @ 0/1C76DE0; LSN 0/1C76E30: prev 0/1C76D90; xid 0: XLOG - > checkpoint: redo 0/1C76DE0; undo 0/0; tli 1; xid 0/1047; oid 16508; multi 1; > offset 0; online",,0,,"xlog.c",8323, > 2016-06-13 20:14:40.306365 > PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"FATAL","58P01","could not > open relation 1663/16508/1247: No such file or directory","Database directory > ""base/16508"" does not exist",,,,"xlog redo newpage: rel 1663/16508/1247; > blk 0 > REDO PASS 3 @ 0/1C7B7A8; LSN 0/1C83800: prev 0/1C7B360; xid 1052: Heap - > newpage: rel 1663/16508/1247; blk 0",,0,,"md.c",1012,"Stack trace: > 1 0x87f232 postgres errstart + 0x252 > 2 0x7ad57a postgres <symbol not found> + 0x7ad57a > 3 0x7ad678 postgres mdnblocks + 0x18 > 4 0x7af3b6 postgres smgrnblocks + 0x16 > 5 0x4f97e7 postgres XLogReadBuffer + 0x17 > 6 0x4c1bf7 postgres heap_redo + 0x4e7 > 7 0x4eb550 postgres <symbol not found> + 0x4eb550 > 8 0x4f4b65 postgres StartupXLOG_Pass3 + 0x155 > 9 0x4f6c08 postgres StartupProcessMain + 0x308 > 10 0x55629d postgres AuxiliaryProcessMain + 0x5bd > 11 0x767706 postgres <symbol not found> + 0x767706 > 12 0x7689ef postgres <symbol not found> + 0x7689ef > 13 0x76d7fd postgres <symbol not found> + 0x76d7fd > 14 0x76f34e postgres PostmasterMain + 0xc7e > 15 0x6c7e9a postgres main + 0x48a > 16 0x3e0541ed1d libc.so.6 __libc_start_main + 0xfd > 17 0x4a26a1 postgres <symbol not found> + 0x4a26a1 > " > 2016-06-13 20:14:40.308171 > PDT,,,p127516,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","startup pass > 3 process (PID 127520) exited with exit code 1",,,,,,,0,,"postmaster.c",4726, > 2016-06-13 20:14:40.308203 > PDT,,,p127516,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","aborting > startup due to startup process failure",,,,,,,0,,"postmaster.c",3912, > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)