[ https://issues.apache.org/jira/browse/IMPALA-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489375#comment-16489375 ]
Philip Zeyliger commented on IMPALA-6119: ----------------------------------------- Does it make sense to be allowing two partitions to have the same location? Is Impala's behavior consistent with Hive and Spark when this happens? > Inconsistent file metadata updates when multiple partitions point to the same > path > ---------------------------------------------------------------------------------- > > Key: IMPALA-6119 > URL: https://issues.apache.org/jira/browse/IMPALA-6119 > Project: IMPALA > Issue Type: Bug > Components: Catalog > Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0 > Reporter: bharath v > Assignee: Gabor Kaszab > Priority: Critical > Labels: correctness, ramp-up > > Following steps can give inconsistent results. > {noformat} > // Create a partitioned table > create table test(a int) partitioned by (b int); > // Create two partitions b=1/b=2 mapped to the same HDFS location. > insert into test partition(b=1) values (1); > alter table test add partition (b=2) location > 'hdfs://localhost:20500/test-warehouse/test/b=1/' > [localhost:21000] > show partitions test; > Query: show partitions test > +-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+ > | b | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | > Incremental stats | Location | > +-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+ > | 1 | -1 | 1 | 2B | NOT CACHED | NOT CACHED | TEXT | > false | hdfs://localhost:20500/test-warehouse/test/b=1 | > | 2 | -1 | 1 | 2B | NOT CACHED | NOT CACHED | TEXT | > false | hdfs://localhost:20500/test-warehouse/test/b=1 | > | Total | -1 | 2 | 4B | 0B | | | > | | > +-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+ > // Insert new data into one of the partitions > insert into test partition(b=1) values (2); > // Newly added file is reflected only in the added partition files. > show files in test; > Query: show files in test > +----------------------------------------------------------------------------------------------------+------+-----------+ > | Path > | Size | Partition | > +----------------------------------------------------------------------------------------------------+------+-----------+ > | > hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0. > | 2B | b=1 | > | > hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0. > | 2B | b=1 | > | > hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0. > | 2B | b=2 | > +----------------------------------------------------------------------------------------------------+------+-----------+ > invalidate metadata test; > show files in test; > // After invalidation, the newly added file now shows up in both the > partitions. > Query: show files in test > +----------------------------------------------------------------------------------------------------+------+-----------+ > | Path > | Size | Partition | > +----------------------------------------------------------------------------------------------------+------+-----------+ > | > hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0. > | 2B | b=1 | > | > hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0. > | 2B | b=1 | > | > hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0. > | 2B | b=2 | > | > hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0. > | 2B | b=2 | > +----------------------------------------------------------------------------------------------------+------+-----------+ > {noformat} > So, depending whether the user invalidates the table, they can see different > results. The bug is in the following code. > {noformat} > private FileMetadataLoadStats resetAndLoadFileMetadata( > Path partDir, List<HdfsPartition> partitions) throws IOException { > FileMetadataLoadStats loadStats = new FileMetadataLoadStats(partDir); > .... > .... > .... > for (HdfsPartition partition: partitions) > partition.setFileDescriptors(newFileDescs); <====== > {noformat} > We only update the added file metadata for the new partition (copy-on-write > way). Instead we should update the source descriptors so that it is reflected > in the other partitions too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org