[ https://issues.apache.org/jira/browse/HAWQ-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15663116#comment-15663116 ]
Ming LI commented on HAWQ-1149: ------------------------------- Internally there are 3 bugs: 1) hawq2.0 changed the file path of relation file on hdfs from filespace/db/relfile.filenum to filespace/db/table/relfile/filenum. When we scan relfile on hdfs, we should change scan logic correspondingly. 2) Fetched dummy persistentTid & persistentSerialNum to PersistentRelation_MarkCreatePending() . 3) Need to reset Relation->rd_relationnodeinfo.isPresent, so that next time persistentid and serial# can be refetched during PersistentBuild_BuildDb(). > Built-in function gp_persistent_build_all loses data in gp_relfile_node and > gp_persistent_relfile_node > ------------------------------------------------------------------------------------------------------ > > Key: HAWQ-1149 > URL: https://issues.apache.org/jira/browse/HAWQ-1149 > Project: Apache HAWQ > Issue Type: Bug > Components: Core > Reporter: Chunling Wang > Assignee: Lei Chang > > When we create a new table, and insert data into it. There will be records in > gp_relfile_node, gp_persistent_relfile_node and gp_persistent_relation_node. > But if we run the HAWQ build-in function gp_persistent_build_all, we will > find that the record in gp_relfile_node and gp_persistent_relfile_node for > this table is lost. And if there are more than 1 file in this talbe, we will > get error when we drop this table. Here are the steps to recur this bug: > 1. Create table a, and insert data into a with two concurrent process: > {code} > postgres=# create table a(id int); > CREATE TABLE > postgres=# insert into a select generate_series(1, 10000000); > INSERT 0 10000000 > {code} > {code} > postgres=# insert into a select generate_series(10000000, 20000000); > INSERT 0 10000001 > {code} > 2. Check the persistent table and find two files in this table's directory: > {code} > postgres=# select oid from pg_class where relname='a'; > oid > --------- > 3017232 > (1 row) > postgres=# select * from gp_relfile_node where relfilenode_oid=3017232; > relfilenode_oid | segment_file_num | persistent_tid | persistent_serial_num > -----------------+------------------+----------------+----------------------- > 3017232 | 1 | (4,128) | 855050 > 3017232 | 2 | (4,129) | 855051 > (2 rows) > postgres=# select * from gp_persistent_relation_node where > relfilenode_oid=3017232; > tablespace_oid | database_oid | relfilenode_oid | persistent_state | > reserved | parent_xid | persistent_serial_num | previous_free_tid > ----------------+--------------+-----------------+------------------+----------+------------+-----------------------+------------------- > 16385 | 16387 | 3017232 | 2 | > 0 | 0 | 158943 | (0,0) > (1 row) > postgres=# select * from gp_persistent_relfile_node where > relfilenode_oid=3017232; > tablespace_oid | database_oid | relfilenode_oid | segment_file_num | > relation_storage_manager | persistent_state | relation_bufpool_kind | > parent_xid | persistent_serial_num | previous_free_tid > ----------------+--------------+-----------------+------------------+--------------------------+------------------+-----------------------+------------+-----------------------+------------------- > 16385 | 16387 | 3017232 | 1 | > 2 | 2 | 0 | 0 | > 855050 | (0,0) > 16385 | 16387 | 3017232 | 2 | > 2 | 2 | 0 | 0 | > 855051 | (0,0) > (2 rows) > hadoop fs -ls /hawq_default/16385/16387/3017232 > -rw------- 3 wangchunling supergroup 100103584 2016-11-08 17:02 > /hawq_default/16385/16387/3017232/1 > -rw------- 3 wangchunling supergroup 100103600 2016-11-08 17:02 > /hawq_default/16385/16387/3017232/2 > {code} > 3. Rebuilt persistent tables. > {code} > postgres=# insert into a select generate_series(10000000, 20000000); > INSERT 0 10000001 > postgres=# select gp_persistent_reset_all(); > gp_persistent_reset_all > ------------------------- > 1 > (1 row) > postgres=# select gp_persistent_build_all(false); > gp_persistent_build_all > ------------------------- > 1 > (1 row) > {code} > 4. Check persistent table and find data lost in gp_relfile_node and > gp_persistent_relfile_node. > {code} > postgres=# select * from gp_relfile_node where relfilenode_oid=3017232; > relfilenode_oid | segment_file_num | persistent_tid | persistent_serial_num > -----------------+------------------+----------------+----------------------- > (0 rows) > postgres=# select * from gp_persistent_relation_node where > relfilenode_oid=3017232; > tablespace_oid | database_oid | relfilenode_oid | persistent_state | > reserved | parent_xid | persistent_serial_num | previous_free_tid > ----------------+--------------+-----------------+------------------+----------+------------+-----------------------+------------------- > 16385 | 16387 | 3017232 | 2 | > 0 | 0 | 159020 | (0,0) > (1 row) > postgres=# select * from gp_persistent_relfile_node where > relfilenode_oid=3017232; > tablespace_oid | database_oid | relfilenode_oid | segment_file_num | > relation_storage_manager | persistent_state | relation_bufpool_kind | > parent_xid | persistent_serial_num | previous_free_tid > ----------------+--------------+-----------------+------------------+--------------------------+------------------+-----------------------+------------+-----------------------+------------------- > (0 rows) > {code} > 5. Drop talbe a and get error. > {code} > postgres=# DROP TABLE a; > ERROR: TID for persistent 'Relation Directory: '16385/16387/3017232'' tuple > is invalid (0,0) (index 2, transaction kind 'Commit') > (persistentendxactrec.c:264) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)