[I] [Bug] flink write with iceberg metadata and use spark read, but spark cannot read with partition infomation [paimon]

via GitHub Thu, 22 Jan 2026 03:48:46 -0800


blackflash997997 opened a new issue, #7108:
URL: https://github.com/apache/paimon/issues/7108


   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   
   ### Paimon version
   
   flink catalog:
   ```
   CREATE CATALOG paimon_catalog WITH (
       'type' = 'paimon',
       'metastore' = 'hive',
       'uri' = 'thrift://xxx:9083',
       'warehouse' = 'jfs://poc-jfs/user/hive/lakehouse_paimon',
       'table-default.metadata.iceberg.storage'='hive-catalog',
       'table-default.metadata.iceberg.uri'='thrift://x:9083'xxx
   );
   
   USE CATALOG paimon_catalog;
   ```
   i'm using following sql write data from kafka
   ```
   insert into my_database.paimon_table
   select *,
   DATE_FORMAT(SYSTEMDATE,'yyyyMMdd') -- this is a partition field : dt
   from kafka
   where `TIME` IS NOT NULL
   ;
   ```
   then i use spark-sql to query:
   ```
   spark-sql  --conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
--conf spark.sql.catalog.spark_catalog.type=hive
   
   select * from my_database.paimon_table where dt=20251231
   ```
   
   and i found that spark would scan all data to find these  "dt=20251231" rows
   
   
   I also found that using the `describe formatted my_database.paimon_table; `  
did not display the `# Metadata Columns` fields that would be present when 
creating the `paimon` table using `spark-sql`.
   
   like following, there is no `# Metadata Columns` ,which cause using 
partition field to filter data failed
   ```
   ......
   dt                      string                  from deserializer   
                                                                       
   # Detailed Table Information                                                
   Catalog                 spark_catalog                               
   Database                paimon_flink1                               
   Table                   zvos_flink_14_append2                       
   Owner                   zoomspace                                   
   Created Time            Thu Jan 22 17:40:42 CST 2026                        
   Last Access             Thu Jan 22 17:40:42 CST 2026                        
   Created By              Spark 2.2 or prior                          
   Type                    MANAGED                                     
   Provider                hive                                        
   Comment                                                             
   Table Properties        [metadata.iceberg.storage=hive-catalog, 
metadata.iceberg.uri=thrift://xxxx:9083, 
metadata_location=jfs://poc-jfs/user/hive/lakehouse_paimon/iceberg/paimon_flink1/zvos_flink_14_append2/metadata/v3.metadata.json,
 partition=dt, 
previous_metadata_location=jfs://poc-jfs/user/hive/lakehouse_paimon/iceberg/paimon_flink1/zvos_flink_14_append2/metadata/v2.metadata.json,
 storage_handler=org.apache.paimon.hive.PaimonStorageHandler, 
table_type=PAIMON, transient_lastDdlTime=1769074842]                         
   Statistics              87763321 bytes                              
   Location                
jfs://poc-jfs/user/hive/lakehouse_paimon/paimon_flink1.db/zvos_flink_14_append2 
                    
   Serde Library           org.apache.paimon.hive.PaimonSerDe                   
       
   InputFormat             org.apache.paimon.hive.mapred.PaimonInputFormat      
               
   OutputFormat            org.apache.paimon.hive.mapred.PaimonOutputFormat     
                       
   
   ```
   
   ### Compute Engine
   
   paimon version: 1.4.1 snapshot
   Write: using flink 1.20.1 on yarn with JuiceFS filesystem
   Read: using spark3.5.2 、iceberg 1.6.1 
   
   ### Minimal reproduce step
   
   use flink to write a partition-key table with iceberg metadata,
   use ` where partition-key=xxxx ` to filter data
   
   ### What doesn't meet your expectations?
   
   use ` where partition-key=xxxx ` to filter data would be scan a specific 
path  ,not scan all data
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Bug] flink write with iceberg metadata and use spark read, but spark cannot read with partition infomation [paimon]

Reply via email to