[
https://issues.apache.org/jira/browse/FLINK-31975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18042685#comment-18042685
]
huyuliang edited comment on FLINK-31975 at 12/4/25 2:35 AM:
------------------------------------------------------------
当使用filesystem
source时,会遍历path目录以寻找分区结构,具体逻辑在PartitionPathUtils中的方法listStatusRecursively,该方法没有对隐藏文件进行处理,所以在path目录包含未完全写入的分区时(目录不具有分区结构),会在FileSystemTableSource.toFullLinkedPartSpec方法抛出
```java
for (String k : partitionKeys) {
if (!part.containsKey(k))
{ throw new TableException( "Partition keys are: " + partitionKeys + ",
incomplete partition spec: " + part); }
map.put(k, part.get(k));
}
```
was (Author: JIRAUSER309438):
当使用filesystem
source时,会遍历path目录以寻找分区结构,具体逻辑在PartitionPathUtils中的方法listStatusRecursively,该方法没有对隐藏文件进行处理,所以在path目录包含未完全写入的分区时(目录不具有分区结构),会在FileSystemTableSource.toFullLinkedPartSpec方法抛出
```
for (String k : partitionKeys) {
if (!part.containsKey(k)) {
throw new TableException(
"Partition keys are: "
+ partitionKeys
+ ", incomplete partition spec: "
+ part);
}
map.put(k, part.get(k));
}
```
> default catalog failed to retrieve partition Spec
> -------------------------------------------------
>
> Key: FLINK-31975
> URL: https://issues.apache.org/jira/browse/FLINK-31975
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Client
> Affects Versions: 1.16.0
> Reporter: Samrat Deb
> Priority: Major
>
> Here is the attached Repro for the error .
> - Flink 1.16.0 cluster
>
>
> {code:java}
> Flink SQL> show current catalog
> > ;
> +----------------------+
> | current catalog name |
> +----------------------+
> | default_catalog |
> +----------------------+
> 1 row in set
> Flink SQL> show tables;
> +-------------------+
> | table name |
> +-------------------+
> | country_page_view |
> | page_view_source |
> | part_table |
> +-------------------+
> 3 rows in set
> Flink SQL> drop table page_view_source;
> [INFO] Execute statement succeed.
> Flink SQL> drop table country_page_view;
> [INFO] Execute statement succeed.
> Flink SQL> CREATE TABLE page_view_source (`user` STRING, `cnt` INT, `date`
> STRING, `country` STRING)
> > WITH (
> > 'connector' = 'datagen', 'number-of-rows' = '10'
> > );
> [INFO] Execute statement succeed.
> Flink SQL> CREATE TABLE country_page_view (`user` STRING, `cnt` INT, `date`
> STRING, `country` STRING)
> > PARTITIONED BY (`date`, `country`)
> > WITH (
> >
> > 'format' = 'csv',
> > 'path' =
> > 's3://dbsamrat-emr-dev/glue-catalog/dbsamrat/country_page_view/',
> > 'connector' = 'filesystem'
> > )
> > ;
> [INFO] Execute statement succeed.
> Flink SQL> INSERT INTO country_page_view PARTITION (`date`='2019-8-30',
> `country`='China')
> > SELECT `user`, `cnt` FROM page_view_source;
> >
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:36,133 INFO
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] -
> Connecting to ResourceManager at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:36,134 INFO org.apache.hadoop.yarn.client.AHSProxy
> [] - Connecting to Application History server at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:36,135 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - No path for the flink jar passed. Using the location of
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:36,135 WARN org.apache.flink.yarn.YarnClusterDescriptor
> [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR
> environment variable is set.The Flink YARN Client needs one of these to be
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:36,149 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - Found Web Interface
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: 7c39db71be1f1b52e13a72831fed8105
> Flink SQL> EXECUTE INSERT INTO country_page_view PARTITION
> (`date`='2019-8-30', `country`='China')
> > SELECT `user`, `cnt` FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:41,424 INFO
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] -
> Connecting to ResourceManager at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:41,424 INFO org.apache.hadoop.yarn.client.AHSProxy
> [] - Connecting to Application History server at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:41,424 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - No path for the flink jar passed. Using the location of
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:41,424 WARN org.apache.flink.yarn.YarnClusterDescriptor
> [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR
> environment variable is set.The Flink YARN Client needs one of these to be
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:41,427 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - Found Web Interface
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: 69e18cb23f505528948a6398390ad070
> Flink SQL> INSERT INTO country_page_view PARTITION (`date`='2019-8-30')
> > SELECT `user`, `cnt`, `country` FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:47,509 INFO
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] -
> Connecting to ResourceManager at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:47,509 INFO org.apache.hadoop.yarn.client.AHSProxy
> [] - Connecting to Application History server at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:47,509 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - No path for the flink jar passed. Using the location of
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:47,510 WARN org.apache.flink.yarn.YarnClusterDescriptor
> [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR
> environment variable is set.The Flink YARN Client needs one of these to be
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:47,512 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - Found Web Interface
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: dc82613e0f2f8a2bafc61dcd35486f4e
> Flink SQL> INSERT OVERWRITE country_page_view PARTITION (`date`='2019-8-30',
> `country`='China')
> > SELECT `user`, `cnt` FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:53,534 INFO
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] -
> Connecting to ResourceManager at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:53,534 INFO org.apache.hadoop.yarn.client.AHSProxy
> [] - Connecting to Application History server at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:53,535 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - No path for the flink jar passed. Using the location of
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:53,535 WARN org.apache.flink.yarn.YarnClusterDescriptor
> [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR
> environment variable is set.The Flink YARN Client needs one of these to be
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:53,542 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - Found Web Interface
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: 117900654da5a89ce517d85383d4fe4a
> Flink SQL> INSERT OVERWRITE country_page_view PARTITION (`date`='2019-8-30')
> > SELECT `user`, `cnt`, `country` FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:58,834 INFO
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] -
> Connecting to ResourceManager at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:58,834 INFO org.apache.hadoop.yarn.client.AHSProxy
> [] - Connecting to Application History server at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:58,834 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - No path for the flink jar passed. Using the location of
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:58,835 WARN org.apache.flink.yarn.YarnClusterDescriptor
> [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR
> environment variable is set.The Flink YARN Client needs one of these to be
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:58,838 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - Found Web Interface
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: ca63640e867b9309b8c69d4dba7d94b1
> Flink SQL> INSERT INTO country_page_view PARTITION (`date`='2019-8-30',
> `country`='China') (`user`)
> > SELECT user FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:52:04,467 INFO
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] -
> Connecting to ResourceManager at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:52:04,469 INFO org.apache.hadoop.yarn.client.AHSProxy
> [] - Connecting to Application History server at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:52:04,470 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - No path for the flink jar passed. Using the location of
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:52:04,470 WARN org.apache.flink.yarn.YarnClusterDescriptor
> [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR
> environment variable is set.The Flink YARN Client needs one of these to be
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:52:04,474 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - Found Web Interface
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: 8bca09468a1193f47500ab3eadf04375
> {code}
>
> Finally while selecting rows from the table , it throws the following error
> {code:java}
> Flink SQL> select * from country_page_view;
> [ERROR] Could not execute SQL statement. Reason:
> org.apache.flink.table.api.TableException: Partition keys are: [date,
> country], incomplete partition spec: {}
> Flink SQL>
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)