Dong Li created HAWQ-478:
----------------------------

             Summary: Bug when shut down cluster during recovery pass3   
                 Key: HAWQ-478
                 URL: https://issues.apache.org/jira/browse/HAWQ-478
             Project: Apache HAWQ
          Issue Type: Bug
          Components: Transaction
            Reporter: Dong Li
            Assignee: Lei Chang


 Shutting down cluster when master  recovering in  pass3   cause inconsistency 
between pg_class and gp_persistent table.
And it cause data loss.
{code}
2016-03-01 01:56:33.032318 
PST,,,p119941,th731297984,,,,0,,,seg-10000,,,,,"LOG","00000","checkpoint record 
is at 0/302AD30",,,,,,,0,,"xlog.c",6304,
2016-03-01 01:56:33.032337 
PST,,,p119941,th731297984,,,,0,,,seg-10000,,,,,"LOG","00000","redo record is at 
0/302AD30; undo record is at 0/0; shutdown FALSE",,,,,,,0,,"xlog.c",6338,
2016-03-01 01:56:33.032353 
PST,,,p119941,th731297984,,,,0,,,seg-10000,,,,,"LOG","00000","next transaction 
ID: 0/1045; next OID: 24726",,,,,,,0,,"xlog.c",6342,
2016-03-01 01:56:33.032367 
PST,,,p119941,th731297984,,,,0,,,seg-10000,,,,,"LOG","00000","next MultiXactId: 
1; next MultiXactOffset: 0",,,,,,,0,,"xlog.c",6345,
2016-03-01 01:56:33.032382 
PST,,,p119941,th731297984,,,,0,,,seg-10000,,,,,"LOG","00000","database system 
was not properly shut down; automatic recovery in 
progress",,,,,,,0,,"xlog.c",6434,
2016-03-01 01:56:33.033329 
PST,,,p119941,th731297984,,,,0,,,seg-10000,,,,,"LOG","00000","redo starts at 
0/302AD80",,,,,,,0,,"xlog.c",6523,
2016-03-01 01:56:33.089749 
PST,,,p119941,th731297984,,,,0,,,seg-10000,,,,,"LOG","00000","record with zero 
length at 0/77A7708",,,,,,,0,,"xlog.c",4110,
2016-03-01 01:56:33.089792 
PST,,,p119941,th731297984,,,,0,,,seg-10000,,,,,"LOG","00000","redo done at 
0/77A76D8",,,,,,,0,,"xlog.c",6560,
2016-03-01 01:56:33.089893 
PST,,,p119941,th731297984,,,,0,,,seg-10000,,,,,"LOG","00000","end of 
transaction log location is 0/77A7708",,,,,,,0,,"xlog.c",6582,
2016-03-01 01:56:33.738889 
PST,,,p119941,th731297984,,,,0,,,seg-10000,,,,,"LOG","00000","Finished startup 
pass 1.  Proceeding to startup crash recovery passes 2 and 
3.",,,,,,,0,,"xlog.c",6816,
2016-03-01 01:56:34.525387 
PST,,,p118947,th731297984,,,,0,,,seg-10000,,,,,"LOG","00000","received smart 
shutdown request",,,,,,,0,,"postmaster.c",3447,
2016-03-01 01:56:35.042857 
PST,,,p119958,th731297984,,,,0,,,seg-10000,,,,,"WARNING","XX000","could not 
remove relation directory 16385/16536/20219: Success 
(smgr.c:1049)",,,,,"Dropping file-system object -- Relation Directory: 
'16385/16536/20219'",,0,,"smgr.c",1049,
2016-03-01 01:56:35.131058 
PST,,,p119958,th731297984,,,,0,,,seg-10000,,,,,"WARNING","XX000","could not 
remove relation directory 16385/16536/16894: Success 
(smgr.c:1049)",,,,,"Dropping file-system object -- Relation Directory: 
'16385/16536/16894'",,0,,"smgr.c",1049,
2016-03-01 01:56:35.584893 
PST,,,p119958,th731297984,,,,0,,,seg-10000,,,,,"LOG","00000","Finished startup 
crash recovery pass 2",,,,,,,0,,"xlog.c",6987,
2016-03-01 01:56:35.590423 
PST,,,p120017,th731297984,,,,0,,,seg-10000,,,,,"LOG","00000","shutting 
down",,,,,,,0,,"xlog.c",7853,
2016-03-01 01:56:35.592973 
PST,,,p120017,th731297984,,,,0,,,seg-10000,,,,,"LOG","00000","database system 
is shut down",,,,,,,0,,"xlog.c",7874,
{code}


{code}
cr_workload=# select *  from pg_class where relname like 'create_insert%' and 
relname not like '%prt%';
    relname     | relnamespace | reltype | relowner | relam | relfilenode | 
reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | 
relaosegrelid | relaosegidxid | relhasindex | relisshared | relkind | 
relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs 
| relhasoids | relh
aspkey | relhasrules | relhassubclass | relfrozenxid | relacl |    reloptions
----------------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+---------------+---------------+-------------+-------------+---------+------------+----------+-----------+-------------+----------+----------+---------+------------+-----
-------+-------------+----------------+--------------+--------+-------------------
 create_insert1 |         2200 |  696503 |       10 |     0 |      702761 |     
        0 |        0 |         0 |             0 |             0 |             
0 |             0 | f           | f           | r       | a          |        3 
|         0 |           0 |        0 |        0 |       0 | f          | f
       | f           | t              |        11609 |        | 
{appendonly=true}
(1 row)

cr_workload=# \d
No relations found.
cr_workload=# select * from create_insert1;
ERROR:  relation "create_insert1" does not exist
LINE 1: select * from create_insert1;
                      ^
cr_workload=# select * from gp_persistent_relation_node where relfilenode_oid = 
702761;
 tablespace_oid | database_oid | relfilenode_oid | persistent_state | reserved 
| parent_xid | persistent_serial_num | previous_free_tid
----------------+--------------+-----------------+------------------+----------+------------+-----------------------+-------------------
          16385 |       696501 |          702761 |                2 |        0 
|          0 |                 31380 | (0,0)
(1 row)
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to