[ 
https://issues.apache.org/jira/browse/HAWQ-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15663116#comment-15663116
 ] 

Ming LI commented on HAWQ-1149:
-------------------------------

Internally there are 3 bugs:
1) hawq2.0 changed the file path of relation file on hdfs from 
filespace/db/relfile.filenum to filespace/db/table/relfile/filenum. When we 
scan relfile on hdfs, we should change scan logic correspondingly.
2) Fetched dummy persistentTid & persistentSerialNum to 
PersistentRelation_MarkCreatePending() .
3) Need to reset Relation->rd_relationnodeinfo.isPresent, so that next time 
persistentid and serial# can be refetched during PersistentBuild_BuildDb().

> Built-in function gp_persistent_build_all loses data in gp_relfile_node and 
> gp_persistent_relfile_node
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HAWQ-1149
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1149
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Core
>            Reporter: Chunling Wang
>            Assignee: Lei Chang
>
> When we create a new table, and insert data into it. There will be records in 
> gp_relfile_node, gp_persistent_relfile_node and gp_persistent_relation_node. 
> But if we run the HAWQ build-in function gp_persistent_build_all, we will 
> find that the record in gp_relfile_node and gp_persistent_relfile_node for 
> this table is lost. And if there are more than 1 file in this talbe, we will 
> get error when we drop this table. Here are the steps to recur this bug:
> 1. Create table a, and insert data into a with two concurrent process:
> {code}
> postgres=# create table a(id int);
> CREATE TABLE
> postgres=# insert into a select generate_series(1, 10000000);
> INSERT 0 10000000
> {code}
> {code}
> postgres=# insert into a select generate_series(10000000, 20000000);
> INSERT 0 10000001
> {code}
> 2. Check the persistent table and find two files in this table's directory:
> {code}
> postgres=# select oid from pg_class where relname='a';
>    oid
> ---------
>  3017232
> (1 row)
> postgres=# select * from gp_relfile_node where relfilenode_oid=3017232;
>  relfilenode_oid | segment_file_num | persistent_tid | persistent_serial_num
> -----------------+------------------+----------------+-----------------------
>          3017232 |                1 | (4,128)        |                855050
>          3017232 |                2 | (4,129)        |                855051
> (2 rows)
> postgres=# select * from gp_persistent_relation_node where 
> relfilenode_oid=3017232;
>  tablespace_oid | database_oid | relfilenode_oid | persistent_state | 
> reserved | parent_xid | persistent_serial_num | previous_free_tid
> ----------------+--------------+-----------------+------------------+----------+------------+-----------------------+-------------------
>           16385 |        16387 |         3017232 |                2 |        
> 0 |          0 |                158943 | (0,0)
> (1 row)
> postgres=# select * from gp_persistent_relfile_node where 
> relfilenode_oid=3017232;
>  tablespace_oid | database_oid | relfilenode_oid | segment_file_num | 
> relation_storage_manager | persistent_state | relation_bufpool_kind | 
> parent_xid | persistent_serial_num | previous_free_tid
> ----------------+--------------+-----------------+------------------+--------------------------+------------------+-----------------------+------------+-----------------------+-------------------
>           16385 |        16387 |         3017232 |                1 |         
>                2 |                2 |                     0 |          0 |    
>             855050 | (0,0)
>           16385 |        16387 |         3017232 |                2 |         
>                2 |                2 |                     0 |          0 |    
>             855051 | (0,0)
> (2 rows)
> hadoop fs -ls /hawq_default/16385/16387/3017232
> -rw-------   3 wangchunling supergroup  100103584 2016-11-08 17:02 
> /hawq_default/16385/16387/3017232/1
> -rw-------   3 wangchunling supergroup  100103600 2016-11-08 17:02 
> /hawq_default/16385/16387/3017232/2
> {code}
> 3. Rebuilt persistent tables.
> {code}
> postgres=# insert into a select generate_series(10000000, 20000000);
> INSERT 0 10000001
> postgres=# select gp_persistent_reset_all();
>  gp_persistent_reset_all
> -------------------------
>                        1
> (1 row)
> postgres=# select gp_persistent_build_all(false);
>  gp_persistent_build_all
> -------------------------
>                        1
> (1 row)
> {code}
> 4. Check persistent table and find data lost in gp_relfile_node and 
> gp_persistent_relfile_node.
> {code}
> postgres=# select * from gp_relfile_node where relfilenode_oid=3017232;
>  relfilenode_oid | segment_file_num | persistent_tid | persistent_serial_num
> -----------------+------------------+----------------+-----------------------
> (0 rows)
> postgres=# select * from gp_persistent_relation_node where 
> relfilenode_oid=3017232;
>  tablespace_oid | database_oid | relfilenode_oid | persistent_state | 
> reserved | parent_xid | persistent_serial_num | previous_free_tid
> ----------------+--------------+-----------------+------------------+----------+------------+-----------------------+-------------------
>           16385 |        16387 |         3017232 |                2 |        
> 0 |          0 |                159020 | (0,0)
> (1 row)
> postgres=# select * from gp_persistent_relfile_node where 
> relfilenode_oid=3017232;
>  tablespace_oid | database_oid | relfilenode_oid | segment_file_num | 
> relation_storage_manager | persistent_state | relation_bufpool_kind | 
> parent_xid | persistent_serial_num | previous_free_tid
> ----------------+--------------+-----------------+------------------+--------------------------+------------------+-----------------------+------------+-----------------------+-------------------
> (0 rows)
> {code}
> 5. Drop talbe a and get error.
> {code}
> postgres=# DROP TABLE a;
> ERROR:  TID for persistent 'Relation Directory: '16385/16387/3017232'' tuple 
> is invalid (0,0) (index 2, transaction kind 'Commit') 
> (persistentendxactrec.c:264)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to