[ 
https://issues.apache.org/jira/browse/HUDI-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4932:
---------------------------------
    Labels: pull-request-available  (was: )

> Add a config to allow partition column type inference in bootstrap
> ------------------------------------------------------------------
>
>                 Key: HUDI-4932
>                 URL: https://issues.apache.org/jira/browse/HUDI-4932
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: bootstrap
>            Reporter: Ethan Guo
>            Assignee: Jonathan Vexler
>            Priority: Major
>              Labels: pull-request-available
>
> Currently, we assume that the partition column is always in String type 
> during bootstrap operation.  
> TestDataSourceForBootstrap.testMetadataBootstrapCOWHiveStylePartitioned fails 
> for date partition column if the type inference of partition column is turned 
> on.
>  
> We need to add a config to allow partition column inference in bootstrap so 
> that other types of partition columns are supported.
>  
> HoodieSparkBootstrapSchemaProvider
> {code:java}
> private static Schema getBootstrapSourceSchemaParquet(HoodieWriteConfig 
> writeConfig, HoodieEngineContext context, Path filePath) {
>   // NOTE: The type inference of partition column in the parquet table is 
> turned off explicitly,
>   // to be consistent with the existing bootstrap behavior, where the 
> partition column is String
>   // typed in Hudi table.
>   ((HoodieSparkEngineContext) context).getSqlContext()
>       .setConf(SQLConf.PARTITION_COLUMN_TYPE_INFERENCE(), false);
>   StructType parquetSchema = ((HoodieSparkEngineContext) 
> context).getSqlContext().read()
>       .option("basePath", writeConfig.getBootstrapSourceBasePath())
>       .parquet(filePath.toString())
>       .schema(); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to