[ https://issues.apache.org/jira/browse/HUDI-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Vexler updated HUDI-5263: ---------------------------------- Description: When creating the table, for example: {code:java} create table hudi_cow_pt_tbl ( id bigint, name string, ts bigint, dt string, hh string ) using hudi tblproperties ( type = 'cow', primaryKey = 'id', preCombineField = 'ts' hoodie.table.keygenerator.class = 'org.apache.hudi.keygen.NonpartitionedKeyGenerator' ) partitioned by (dt) {code} When attempting to cache the dataframe I read, I got the error {code:java} assertion failed: Empty partition column value in 'partition_path=' java.lang.AssertionError: assertion failed: Empty partition column value in 'partition_path=' {code} This will cause dt to always be null when you read the record. I don't know if the data is stored as null or just reads as null. If this is due to implementation issues and the only fix would be to fail the table creation, I think that is preferable to the current behavior. was: When creating the table, for example: {code:java} create table hudi_cow_pt_tbl ( id bigint, name string, ts bigint, dt string, hh string ) using hudi tblproperties ( type = 'cow', primaryKey = 'id', preCombineField = 'ts' hoodie.table.keygenerator.class = 'org.apache.hudi.keygen.NonpartitionedKeyGenerator' ) partitioned by (dt) {code} This will cause dt to always be null when you read the record. I don't know if the data is stored as null or just reads as null. If this is due to implementation issues and the only fix would be to fail the table creation, I think that is preferable to the current behavior. > Setting partitioned by (partition_path) with nonpartitioned keygenerator in > spark-sql will cause the colum to be null > --------------------------------------------------------------------------------------------------------------------- > > Key: HUDI-5263 > URL: https://issues.apache.org/jira/browse/HUDI-5263 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql > Reporter: Jonathan Vexler > Priority: Major > > When creating the table, for example: > {code:java} > create table hudi_cow_pt_tbl ( > id bigint, > name string, > ts bigint, > dt string, > hh string > ) using hudi > tblproperties ( > type = 'cow', > primaryKey = 'id', > preCombineField = 'ts' > hoodie.table.keygenerator.class = > 'org.apache.hudi.keygen.NonpartitionedKeyGenerator' > ) > partitioned by (dt) {code} > When attempting to cache the dataframe I read, I got the error > > {code:java} > assertion failed: Empty partition column value in 'partition_path=' > java.lang.AssertionError: assertion failed: Empty partition column value in > 'partition_path=' {code} > > This will cause dt to always be null when you read the record. I don't know > if the data is stored as null or just reads as null. If this is due to > implementation issues and the only fix would be to fail the table creation, I > think that is preferable to the current behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010)