[jira] [Updated] (HUDI-5263) Setting partitioned by (partition_path) with nonpartitioned keygenerator in spark-sql will cause the colum to be null

Jonathan Vexler (Jira) Tue, 22 Nov 2022 07:29:06 -0800


     [ 
https://issues.apache.org/jira/browse/HUDI-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jonathan Vexler updated HUDI-5263:
----------------------------------
    Description: 
When creating the table, for example: 
{code:java}
create table hudi_cow_pt_tbl (
  id bigint,
  name string,
  ts bigint,
  dt string,
  hh string
) using hudi
tblproperties (
  type = 'cow',
  primaryKey = 'id',
  preCombineField = 'ts'
  hoodie.table.keygenerator.class = 
'org.apache.hudi.keygen.NonpartitionedKeyGenerator'
 )
partitioned by (dt) {code}
When attempting to cache the dataframe I read, I got the error 

 
{code:java}
assertion failed: Empty partition column value in 'partition_path='
java.lang.AssertionError: assertion failed: Empty partition column value in 
'partition_path=' {code}
 

This will cause dt to always be null when you read the record. I don't know if 
the data is stored as null or just reads as null. If this is due to 
implementation issues and the only fix would be to fail the table creation, I 
think that is preferable to the current behavior.  

  was:
When creating the table, for example: 
{code:java}
create table hudi_cow_pt_tbl (
  id bigint,
  name string,
  ts bigint,
  dt string,
  hh string
) using hudi
tblproperties (
  type = 'cow',
  primaryKey = 'id',
  preCombineField = 'ts'
  hoodie.table.keygenerator.class = 
'org.apache.hudi.keygen.NonpartitionedKeyGenerator'
 )
partitioned by (dt) {code}
This will cause dt to always be null when you read the record. I don't know if 
the data is stored as null or just reads as null. If this is due to 
implementation issues and the only fix would be to fail the table creation, I 
think that is preferable to the current behavior.  


> Setting partitioned by (partition_path) with nonpartitioned keygenerator in 
> spark-sql will cause the colum to be null
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-5263
>                 URL: https://issues.apache.org/jira/browse/HUDI-5263
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: spark-sql
>            Reporter: Jonathan Vexler
>            Priority: Major
>
> When creating the table, for example: 
> {code:java}
> create table hudi_cow_pt_tbl (
>   id bigint,
>   name string,
>   ts bigint,
>   dt string,
>   hh string
> ) using hudi
> tblproperties (
>   type = 'cow',
>   primaryKey = 'id',
>   preCombineField = 'ts'
>   hoodie.table.keygenerator.class = 
> 'org.apache.hudi.keygen.NonpartitionedKeyGenerator'
>  )
> partitioned by (dt) {code}
> When attempting to cache the dataframe I read, I got the error 
>  
> {code:java}
> assertion failed: Empty partition column value in 'partition_path='
> java.lang.AssertionError: assertion failed: Empty partition column value in 
> 'partition_path=' {code}
>  
> This will cause dt to always be null when you read the record. I don't know 
> if the data is stored as null or just reads as null. If this is due to 
> implementation issues and the only fix would be to fail the table creation, I 
> think that is preferable to the current behavior.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5263) Setting partitioned by (partition_path) with nonpartitioned keygenerator in spark-sql will cause the colum to be null

Reply via email to