[ 
https://issues.apache.org/jira/browse/IMPALA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860358#comment-16860358
 ] 

Joe McDonnell commented on IMPALA-8630:
---------------------------------------

The partition_id in THdfsFileSplit is generated by a static atomic counter. 

 
{code:java}
private static AtomicLong partitionIdCounter_ = new AtomicLong();
...
public HdfsPartition(HdfsTable table,
    org.apache.hadoop.hive.metastore.api.Partition msPartition,
    List<LiteralExpr> partitionKeyValues,
    HdfsStorageDescriptor fileFormatDescriptor,
    List<HdfsPartition.FileDescriptor> fileDescriptors,
    TAccessLevel accessLevel) {
  this(table, msPartition, partitionKeyValues, fileFormatDescriptor, 
fileDescriptors,
       partitionIdCounter_.getAndIncrement(),
...{code}
This means the partition_id is not stable. I verified that it changes on 
invalidate metadata for the table. So, consistent scheduling needs to use 
something else.

One can hash the actual path for the partition. A hiccup is that when we are 
doing scheduling, the partition path is compressed and encapsulated by the 
DescriptorTbl. It would be nice to avoid reconstructing the path and hashing it 
on every query, so it would make sense to compute the hash and stash it in 
catalog/HdfsPartition (or some equivalent). Once it is there, the scheduler 
still has to deal with the DescriptorTbl thrift to get it (which would probably 
involve building a data structure). Since the hash is a single number, it might 
be easier to pass a partition_path_hash alongside partition_id in 
THdfsFileSplit.

 

> Consistent remote placement should include partition id when calculating 
> placement
> ----------------------------------------------------------------------------------
>
>                 Key: IMPALA-8630
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8630
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.2.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Blocker
>
> For partitioned tables, the actual filenames within partitions may not have 
> large entropy. Impala includes information in its filenames that would not be 
> the same across partitions, but this is common for tables written by the 
> current CDH version of Hive. For example, in our minicluster, the TPC-DS 
> store_sales table has many partitions, but the actual filenames within 
> partitions are very simple:
> {noformat}
> hdfs dfs -ls /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452642
> Found 1 items
> -rwxr-xr-x 3 joe supergroup 379535 2019-06-05 15:16 
> /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452642/000000_0
> hdfs dfs -ls /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452640
> Found 1 items
> -rwxr-xr-x 3 joe supergroup 412959 2019-06-05 15:16 
> /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452640/000000_0{noformat}
> Right now, consistent remote placement uses the filename+offset without the 
> partition id.
> {code:java}
> uint32_t hash = HashUtil::Hash(hdfs_file_split->relative_path.data(),
>       hdfs_file_split->relative_path.length(), 0);
> {code}
> This would produce a poor balance of files across nodes when there is low 
> entropy in filenames. This should be amended to include the partition id, 
> which is already accessible on the THdfsFileSplit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to