roland created FLINK-35118:
------------------------------

             Summary: StreamingHiveSource cannot track tables that have more 
than 32,767 partitions
                 Key: FLINK-35118
                 URL: https://issues.apache.org/jira/browse/FLINK-35118
             Project: Flink
          Issue Type: Bug
          Components: Connectors / Hive
    Affects Versions: 1.19.0
            Reporter: roland


*Description:*

The Streaming Hive Source cannot track tables that have more than 32,767 
partitions.

 *Root Cause:*

The Streaming Hive Source uses the following lines to get all partitions of a 
table:

([git hub 
link|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/read/HivePartitionFetcherContextBase.java#L130])

HivePartitionFetcherContextBase.java:
{code:java}
    @Override
    public List<ComparablePartitionValue> getComparablePartitionValueList() 
throws Exception {
        List<ComparablePartitionValue> partitionValueList = new ArrayList<>();
        switch (partitionOrder) {
            case PARTITION_NAME:
                List<String> partitionNames =
                        metaStoreClient.listPartitionNames(
                                tablePath.getDatabaseName(),
                                tablePath.getObjectName(),
                                Short.MAX_VALUE);
                for (String partitionName : partitionNames) {
                    
partitionValueList.add(getComparablePartitionByName(partitionName));
                }
                break;
            case CREATE_TIME:
                Map<List<String>, Long> partValuesToCreateTime = new 
HashMap<>();
                partitionNames =
                        metaStoreClient.listPartitionNames(
                                tablePath.getDatabaseName(),
                                tablePath.getObjectName(),
                                Short.MAX_VALUE); {code}
Where the `metaStoreClient` is a wrapper of the `IMetaStoreClient`, and the 
function `listPartitionNames` can only list no more than `Short.MAX_VALUE` 
partitions, which is 32,767.

 

For tables that have more partitions, the source fails to track new partitions 
and read from it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to