[ 
https://issues.apache.org/jira/browse/HUDI-4626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-4626:
----------------------------
    Description: 
 

Currently, creating a table partitioned by "_hoodie_partition_path" fails w/ 
the following exception:
{code:java}
AnalysisException: Found duplicate column(s) in the data schema and the 
partition schema: _hoodie_partition_path
{code}
Using following DDL:
{code:java}
CREATE EXTERNAL TABLE `active_storage_attachments`(  `_hoodie_commit_time` 
string COMMENT '',   `_hoodie_commit_seqno` string COMMENT '',   
`_hoodie_record_key` string COMMENT '',   `_hoodie_file_name` string COMMENT 
'',   `_change_operation_type` string COMMENT '',   
`_upstream_event_processed_ts_ms` bigint COMMENT '',   
`db_shard_source_partition` string COMMENT '',   `_event_origin_ts_ms` bigint 
COMMENT '',   `_event_tx_id` bigint COMMENT '',   `_event_lsn` bigint COMMENT 
'',   `_event_xmin` bigint COMMENT '',   `id` bigint COMMENT '',   `name` 
string COMMENT '',   `record_type` string COMMENT '',   `record_id` bigint 
COMMENT '',   `blob_id` bigint COMMENT '',   `created_at` timestamp COMMENT 
'')PARTITIONED BY (   `_hoodie_partition_path` string COMMENT '')ROW FORMAT 
SERDE   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH 
SERDEPROPERTIES (   'hoodie.query.as.ro.table'='false',   'path'='...') STORED 
AS INPUTFORMAT   'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'LOCATION  
'...'

TBLPROPERTIES (  'spark.sql.sources.provider'='hudi' )
 {code}
 

 

  was:
 

Currently, creating a table partitioned by "_hoodie_partition_path" fails w/ 
the following exception:
{code:java}
// TBA
{code}
Using following DDL:
{code:java}
CREATE EXTERNAL TABLE `active_storage_attachments`(  `_hoodie_commit_time` 
string COMMENT '',   `_hoodie_commit_seqno` string COMMENT '',   
`_hoodie_record_key` string COMMENT '',   `_hoodie_file_name` string COMMENT 
'',   `_change_operation_type` string COMMENT '',   
`_upstream_event_processed_ts_ms` bigint COMMENT '',   
`db_shard_source_partition` string COMMENT '',   `_event_origin_ts_ms` bigint 
COMMENT '',   `_event_tx_id` bigint COMMENT '',   `_event_lsn` bigint COMMENT 
'',   `_event_xmin` bigint COMMENT '',   `id` bigint COMMENT '',   `name` 
string COMMENT '',   `record_type` string COMMENT '',   `record_id` bigint 
COMMENT '',   `blob_id` bigint COMMENT '',   `created_at` timestamp COMMENT 
'')PARTITIONED BY (   `_hoodie_partition_path` string COMMENT '')ROW FORMAT 
SERDE   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH 
SERDEPROPERTIES (   'hoodie.query.as.ro.table'='false',   'path'='...') STORED 
AS INPUTFORMAT   'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'LOCATION  
'...'

TBLPROPERTIES (  'spark.sql.sources.provider'='hudi' )
 {code}
 

 


> Partitioning table by `_hoodie_partition_path` fails
> ----------------------------------------------------
>
>                 Key: HUDI-4626
>                 URL: https://issues.apache.org/jira/browse/HUDI-4626
>             Project: Apache Hudi
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Alexey Kudinkin
>            Priority: Blocker
>
>  
> Currently, creating a table partitioned by "_hoodie_partition_path" fails w/ 
> the following exception:
> {code:java}
> AnalysisException: Found duplicate column(s) in the data schema and the 
> partition schema: _hoodie_partition_path
> {code}
> Using following DDL:
> {code:java}
> CREATE EXTERNAL TABLE `active_storage_attachments`(  `_hoodie_commit_time` 
> string COMMENT '',   `_hoodie_commit_seqno` string COMMENT '',   
> `_hoodie_record_key` string COMMENT '',   `_hoodie_file_name` string COMMENT 
> '',   `_change_operation_type` string COMMENT '',   
> `_upstream_event_processed_ts_ms` bigint COMMENT '',   
> `db_shard_source_partition` string COMMENT '',   `_event_origin_ts_ms` bigint 
> COMMENT '',   `_event_tx_id` bigint COMMENT '',   `_event_lsn` bigint COMMENT 
> '',   `_event_xmin` bigint COMMENT '',   `id` bigint COMMENT '',   `name` 
> string COMMENT '',   `record_type` string COMMENT '',   `record_id` bigint 
> COMMENT '',   `blob_id` bigint COMMENT '',   `created_at` timestamp COMMENT 
> '')PARTITIONED BY (   `_hoodie_partition_path` string COMMENT '')ROW FORMAT 
> SERDE   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH 
> SERDEPROPERTIES (   'hoodie.query.as.ro.table'='false',   'path'='...') 
> STORED AS INPUTFORMAT   'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
> OUTPUTFORMAT   
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'LOCATION  
> '...'
> TBLPROPERTIES (  'spark.sql.sources.provider'='hudi' )
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to