[ 
https://issues.apache.org/jira/browse/IMPALA-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489604#comment-16489604
 ] 

Gabor Kaszab commented on IMPALA-6119:
--------------------------------------

[~bharathv] If I understand your proposal correctly then within 
resetAndLoadFileMetadata() changing the for loop that goes through the received 
partitions to add the new file descriptor to each of them should fix this 
issue. Do I understand it well?

My issue with this is that apparently resetAndLoadFileMetadata() is not invoked 
in case I do an insert to my test table. It most probably goes to the other 
direction towards refreshFileMetadata().
I guess I could do something similar that function as well, however, the 
'partitions' parameter for these functions would hold only b=1 partition (using 
the test case in the description) and still the other partitions pointing to 
the same location has to be found and we are again there to choose between 
solution (1) and (2), right?

Am I missing something?

> Inconsistent file metadata updates when multiple partitions point to the same 
> path
> ----------------------------------------------------------------------------------
>
>                 Key: IMPALA-6119
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6119
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
>            Reporter: bharath v
>            Assignee: Gabor Kaszab
>            Priority: Critical
>              Labels: correctness, ramp-up
>
> Following steps can give inconsistent results.
> {noformat}
> // Create a partitioned table
> create table test(a int) partitioned by (b int);
> // Create two partitions b=1/b=2 mapped to the same HDFS location.
> insert into test partition(b=1) values (1);
> alter table test add partition (b=2) location 
> 'hdfs://localhost:20500/test-warehouse/test/b=1/' 
> [localhost:21000] > show partitions test;
> Query: show partitions test
> +-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+
> | b     | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | 
> Incremental stats | Location                                       |
> +-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+
> | 1     | -1    | 1      | 2B   | NOT CACHED   | NOT CACHED        | TEXT   | 
> false             | hdfs://localhost:20500/test-warehouse/test/b=1 |
> | 2     | -1    | 1      | 2B   | NOT CACHED   | NOT CACHED        | TEXT   | 
> false             | hdfs://localhost:20500/test-warehouse/test/b=1 |
> | Total | -1    | 2      | 4B   | 0B           |                   |        | 
>                   |                                                |
> +-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+
> // Insert new data into one of the partitions
> insert into test partition(b=1) values (2);
> // Newly added file is reflected only in the added partition files. 
> show files in test;
> Query: show files in test
> +----------------------------------------------------------------------------------------------------+------+-----------+
> | Path                                                                        
>                        | Size | Partition |
> +----------------------------------------------------------------------------------------------------+------+-----------+
> | 
> hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0.
>  | 2B   | b=1       |
> | 
> hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0.
>  | 2B   | b=1       |
> | 
> hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0.
>  | 2B   | b=2       |
> +----------------------------------------------------------------------------------------------------+------+-----------+
> invalidate metadata test;
>  show files in test;
> // After invalidation, the newly added file now shows up in both the 
> partitions.
> Query: show files in test
> +----------------------------------------------------------------------------------------------------+------+-----------+
> | Path                                                                        
>                        | Size | Partition |
> +----------------------------------------------------------------------------------------------------+------+-----------+
> | 
> hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0.
>  | 2B   | b=1       |
> | 
> hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0.
>  | 2B   | b=1       |
> | 
> hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0.
>  | 2B   | b=2       |
> | 
> hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0.
>  | 2B   | b=2       |
> +----------------------------------------------------------------------------------------------------+------+-----------+
> {noformat}
> So, depending whether the user invalidates the table, they can see different 
> results. The bug is in the following code.
> {noformat}
> private FileMetadataLoadStats resetAndLoadFileMetadata(
>       Path partDir, List<HdfsPartition> partitions) throws IOException {
>     FileMetadataLoadStats loadStats = new FileMetadataLoadStats(partDir);
> ....
> ....
> ....
>  for (HdfsPartition partition: partitions) 
> partition.setFileDescriptors(newFileDescs);  <======
> {noformat}
> We only update the added file metadata for the new partition (copy-on-write 
> way). Instead we should update the source descriptors so that it is reflected 
> in the other partitions too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to