[ https://issues.apache.org/jira/browse/HUDI-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Vexler updated HUDI-4932: ---------------------------------- Status: In Progress (was: Open) > Add a config to allow partition column type inference in bootstrap > ------------------------------------------------------------------ > > Key: HUDI-4932 > URL: https://issues.apache.org/jira/browse/HUDI-4932 > Project: Apache Hudi > Issue Type: Improvement > Components: bootstrap > Reporter: Ethan Guo > Assignee: Jonathan Vexler > Priority: Major > > Currently, we assume that the partition column is always in String type > during bootstrap operation. > TestDataSourceForBootstrap.testMetadataBootstrapCOWHiveStylePartitioned fails > for date partition column if the type inference of partition column is turned > on. > > We need to add a config to allow partition column inference in bootstrap so > that other types of partition columns are supported. > > HoodieSparkBootstrapSchemaProvider > {code:java} > private static Schema getBootstrapSourceSchemaParquet(HoodieWriteConfig > writeConfig, HoodieEngineContext context, Path filePath) { > // NOTE: The type inference of partition column in the parquet table is > turned off explicitly, > // to be consistent with the existing bootstrap behavior, where the > partition column is String > // typed in Hudi table. > ((HoodieSparkEngineContext) context).getSqlContext() > .setConf(SQLConf.PARTITION_COLUMN_TYPE_INFERENCE(), false); > StructType parquetSchema = ((HoodieSparkEngineContext) > context).getSqlContext().read() > .option("basePath", writeConfig.getBootstrapSourceBasePath()) > .parquet(filePath.toString()) > .schema(); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)