roland created FLINK-35118: ------------------------------ Summary: StreamingHiveSource cannot track tables that have more than 32,767 partitions Key: FLINK-35118 URL: https://issues.apache.org/jira/browse/FLINK-35118 Project: Flink Issue Type: Bug Components: Connectors / Hive Affects Versions: 1.19.0 Reporter: roland
*Description:* The Streaming Hive Source cannot track tables that have more than 32,767 partitions. *Root Cause:* The Streaming Hive Source uses the following lines to get all partitions of a table: ([git hub link|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/read/HivePartitionFetcherContextBase.java#L130]) HivePartitionFetcherContextBase.java: {code:java} @Override public List<ComparablePartitionValue> getComparablePartitionValueList() throws Exception { List<ComparablePartitionValue> partitionValueList = new ArrayList<>(); switch (partitionOrder) { case PARTITION_NAME: List<String> partitionNames = metaStoreClient.listPartitionNames( tablePath.getDatabaseName(), tablePath.getObjectName(), Short.MAX_VALUE); for (String partitionName : partitionNames) { partitionValueList.add(getComparablePartitionByName(partitionName)); } break; case CREATE_TIME: Map<List<String>, Long> partValuesToCreateTime = new HashMap<>(); partitionNames = metaStoreClient.listPartitionNames( tablePath.getDatabaseName(), tablePath.getObjectName(), Short.MAX_VALUE); {code} Where the `metaStoreClient` is a wrapper of the `IMetaStoreClient`, and the function `listPartitionNames` can only list no more than `Short.MAX_VALUE` partitions, which is 32,767. For tables that have more partitions, the source fails to track new partitions and read from it. -- This message was sent by Atlassian Jira (v8.20.10#820010)