Chunling Wang created HAWQ-1149: ----------------------------------- Summary: Built-in function gp_persistent_build_all loses data in gp_relfile_node and gp_persistent_relfile_node Key: HAWQ-1149 URL: https://issues.apache.org/jira/browse/HAWQ-1149 Project: Apache HAWQ Issue Type: Bug Components: Core Reporter: Chunling Wang Assignee: Lei Chang
When we create a new table, and insert data into it. There will be records in gp_relfile_node, gp_persistent_relfile_node and gp_persistent_relation_node. But if we run the HAWQ build-in function gp_persistent_build_all, we will find that the record in gp_relfile_node and gp_persistent_relfile_node for this table is lost. And if there are more than 1 file in this talbe, we will get error when we drop this table. Here are the steps to recur this bug: 1. Create table a, and insert data into a with two concurrent process: {code} postgres=# create table a(id int); CREATE TABLE postgres=# insert into a select generate_series(1, 10000000); INSERT 0 10000000 {code} {code} postgres=# insert into a select generate_series(10000000, 20000000); INSERT 0 10000001 {code} 2. Check the persistent table and find two files in this table's directory: {code} postgres=# select oid from pg_class where relname='a'; oid --------- 3017232 (1 row) postgres=# select * from gp_relfile_node where relfilenode_oid=3017232; relfilenode_oid | segment_file_num | persistent_tid | persistent_serial_num -----------------+------------------+----------------+----------------------- 3017232 | 1 | (4,128) | 855050 3017232 | 2 | (4,129) | 855051 (2 rows) postgres=# select * from gp_persistent_relation_node where relfilenode_oid=3017232; tablespace_oid | database_oid | relfilenode_oid | persistent_state | reserved | parent_xid | persistent_serial_num | previous_free_tid ----------------+--------------+-----------------+------------------+----------+------------+-----------------------+------------------- 16385 | 16387 | 3017232 | 2 | 0 | 0 | 158943 | (0,0) (1 row) postgres=# select * from gp_persistent_relfile_node where relfilenode_oid=3017232; tablespace_oid | database_oid | relfilenode_oid | segment_file_num | relation_storage_manager | persistent_state | relation_bufpool_kind | parent_xid | persistent_serial_num | previous_free_tid ----------------+--------------+-----------------+------------------+--------------------------+------------------+-----------------------+------------+-----------------------+------------------- 16385 | 16387 | 3017232 | 1 | 2 | 2 | 0 | 0 | 855050 | (0,0) 16385 | 16387 | 3017232 | 2 | 2 | 2 | 0 | 0 | 855051 | (0,0) (2 rows) hadoop fs -ls /hawq_default/16385/16387/3017232 -rw------- 3 wangchunling supergroup 100103584 2016-11-08 17:02 /hawq_default/16385/16387/3017232/1 -rw------- 3 wangchunling supergroup 100103600 2016-11-08 17:02 /hawq_default/16385/16387/3017232/2 {code} 3. Rebuilt persistent tables. {code} postgres=# insert into a select generate_series(10000000, 20000000); INSERT 0 10000001 postgres=# select gp_persistent_reset_all(); gp_persistent_reset_all ------------------------- 1 (1 row) postgres=# select gp_persistent_build_all(false); gp_persistent_build_all ------------------------- 1 (1 row) {code} 4. Check persistent table and find data lost in gp_relfile_node and gp_persistent_relfile_node. {code} postgres=# select * from gp_relfile_node where relfilenode_oid=3017232; relfilenode_oid | segment_file_num | persistent_tid | persistent_serial_num -----------------+------------------+----------------+----------------------- (0 rows) postgres=# select * from gp_persistent_relation_node where relfilenode_oid=3017232; tablespace_oid | database_oid | relfilenode_oid | persistent_state | reserved | parent_xid | persistent_serial_num | previous_free_tid ----------------+--------------+-----------------+------------------+----------+------------+-----------------------+------------------- 16385 | 16387 | 3017232 | 2 | 0 | 0 | 159020 | (0,0) (1 row) postgres=# select * from gp_persistent_relfile_node where relfilenode_oid=3017232; tablespace_oid | database_oid | relfilenode_oid | segment_file_num | relation_storage_manager | persistent_state | relation_bufpool_kind | parent_xid | persistent_serial_num | previous_free_tid ----------------+--------------+-----------------+------------------+--------------------------+------------------+-----------------------+------------+-----------------------+------------------- (0 rows) {code} 5. Drop talbe a and get error. {code} postgres=# DROP TABLE a; ERROR: TID for persistent 'Relation Directory: '16385/16387/3017232'' tuple is invalid (0,0) (index 2, transaction kind 'Commit') (persistentendxactrec.c:264) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)