[ 
https://issues.apache.org/jira/browse/FLINK-31975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18042685#comment-18042685
 ] 

huyuliang commented on FLINK-31975:
-----------------------------------

当使用filesystem 
source时,会遍历path目录以寻找分区结构,具体逻辑在PartitionPathUtils中的方法listStatusRecursively,该方法没有对隐藏文件进行处理,所以在path目录包含未完全写入的分区时(目录不具有分区结构),会在FileSystemTableSource.toFullLinkedPartSpec方法抛出
```

f (!part.containsKey(k)) {
throw new TableException(
"Partition keys are: "
+ partitionKeys
+ ", incomplete partition spec: "
+ part);
}
```

> default catalog failed to retrieve partition Spec
> -------------------------------------------------
>
>                 Key: FLINK-31975
>                 URL: https://issues.apache.org/jira/browse/FLINK-31975
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Client
>    Affects Versions: 1.16.0
>            Reporter: Samrat Deb
>            Priority: Major
>
> Here is the attached Repro for the error . 
> -  Flink 1.16.0 cluster 
>  
>  
> {code:java}
> Flink SQL> show current catalog
> > ;
> +----------------------+
> | current catalog name |
> +----------------------+
> |      default_catalog |
> +----------------------+
> 1 row in set
> Flink SQL> show tables;
> +-------------------+
> |        table name |
> +-------------------+
> | country_page_view |
> |  page_view_source |
> |        part_table |
> +-------------------+
> 3 rows in set
> Flink SQL> drop table page_view_source;
> [INFO] Execute statement succeed.
> Flink SQL> drop table country_page_view;
> [INFO] Execute statement succeed.
> Flink SQL> CREATE TABLE  page_view_source (`user` STRING, `cnt` INT, `date` 
> STRING, `country` STRING)
> > WITH (
> >   'connector' = 'datagen',  'number-of-rows' = '10'
> > );
> [INFO] Execute statement succeed.
> Flink SQL> CREATE TABLE country_page_view (`user` STRING, `cnt` INT, `date` 
> STRING, `country` STRING)
> > PARTITIONED BY (`date`, `country`)
> > WITH (
> >
> >    'format' = 'csv',
> >    'path' = 
> > 's3://dbsamrat-emr-dev/glue-catalog/dbsamrat/country_page_view/',
> >    'connector' = 'filesystem'
> > )
> > ;
> [INFO] Execute statement succeed.
> Flink SQL> INSERT INTO country_page_view PARTITION (`date`='2019-8-30', 
> `country`='China')
> >   SELECT `user`, `cnt` FROM page_view_source;
> >
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:36,133 INFO  
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - 
> Connecting to ResourceManager at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:36,134 INFO  org.apache.hadoop.yarn.client.AHSProxy          
>              [] - Connecting to Application History server at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:36,135 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - No path for the flink jar passed. Using the location of 
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:36,135 WARN  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR 
> environment variable is set.The Flink YARN Client needs one of these to be 
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:36,149 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Found Web Interface 
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application 
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: 7c39db71be1f1b52e13a72831fed8105
> Flink SQL> EXECUTE INSERT INTO country_page_view PARTITION 
> (`date`='2019-8-30', `country`='China')
> >   SELECT `user`, `cnt` FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:41,424 INFO  
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - 
> Connecting to ResourceManager at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:41,424 INFO  org.apache.hadoop.yarn.client.AHSProxy          
>              [] - Connecting to Application History server at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:41,424 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - No path for the flink jar passed. Using the location of 
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:41,424 WARN  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR 
> environment variable is set.The Flink YARN Client needs one of these to be 
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:41,427 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Found Web Interface 
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application 
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: 69e18cb23f505528948a6398390ad070
> Flink SQL> INSERT INTO country_page_view PARTITION (`date`='2019-8-30')
> >   SELECT `user`, `cnt`, `country` FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:47,509 INFO  
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - 
> Connecting to ResourceManager at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:47,509 INFO  org.apache.hadoop.yarn.client.AHSProxy          
>              [] - Connecting to Application History server at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:47,509 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - No path for the flink jar passed. Using the location of 
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:47,510 WARN  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR 
> environment variable is set.The Flink YARN Client needs one of these to be 
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:47,512 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Found Web Interface 
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application 
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: dc82613e0f2f8a2bafc61dcd35486f4e
> Flink SQL> INSERT OVERWRITE country_page_view PARTITION (`date`='2019-8-30', 
> `country`='China')
> >   SELECT `user`, `cnt` FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:53,534 INFO  
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - 
> Connecting to ResourceManager at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:53,534 INFO  org.apache.hadoop.yarn.client.AHSProxy          
>              [] - Connecting to Application History server at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:53,535 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - No path for the flink jar passed. Using the location of 
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:53,535 WARN  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR 
> environment variable is set.The Flink YARN Client needs one of these to be 
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:53,542 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Found Web Interface 
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application 
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: 117900654da5a89ce517d85383d4fe4a
> Flink SQL> INSERT OVERWRITE country_page_view PARTITION (`date`='2019-8-30')
> >   SELECT `user`, `cnt`, `country` FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:58,834 INFO  
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - 
> Connecting to ResourceManager at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:58,834 INFO  org.apache.hadoop.yarn.client.AHSProxy          
>              [] - Connecting to Application History server at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:58,834 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - No path for the flink jar passed. Using the location of 
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:58,835 WARN  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR 
> environment variable is set.The Flink YARN Client needs one of these to be 
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:58,838 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Found Web Interface 
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application 
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: ca63640e867b9309b8c69d4dba7d94b1
> Flink SQL> INSERT INTO country_page_view PARTITION (`date`='2019-8-30', 
> `country`='China') (`user`)
> >   SELECT user FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:52:04,467 INFO  
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - 
> Connecting to ResourceManager at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:52:04,469 INFO  org.apache.hadoop.yarn.client.AHSProxy          
>              [] - Connecting to Application History server at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:52:04,470 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - No path for the flink jar passed. Using the location of 
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:52:04,470 WARN  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR 
> environment variable is set.The Flink YARN Client needs one of these to be 
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:52:04,474 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Found Web Interface 
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application 
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: 8bca09468a1193f47500ab3eadf04375
> {code}
>  
> Finally while selecting rows from the table , it throws the following error 
> {code:java}
> Flink SQL> select * from country_page_view;
> [ERROR] Could not execute SQL statement. Reason:
> org.apache.flink.table.api.TableException: Partition keys are: [date, 
> country], incomplete partition spec: {}
> Flink SQL>
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to