[ https://issues.apache.org/jira/browse/HUDI-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HUDI-4966: --------------------------------- Labels: pull-request-available (was: ) > Meta sync throws exception if TimestampBasedKeyGenerator is used to generate > partition path containing slashes > -------------------------------------------------------------------------------------------------------------- > > Key: HUDI-4966 > URL: https://issues.apache.org/jira/browse/HUDI-4966 > Project: Apache Hudi > Issue Type: Bug > Reporter: Ethan Guo > Assignee: Ethan Guo > Priority: Critical > Labels: pull-request-available > Fix For: 0.12.1 > > > For Deltastreamer, when using TimestampBasedKeyGenerator with the output > format of partition path containing slashes, e.g., "yyyy/MM/dd", and > hive-style partitioning disabled (by default), the meta sync fails. > {code:java} > --hoodie-conf hoodie.datasource.write.partitionpath.field=createdDate > --hoodie-conf > hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator > --hoodie-conf hoodie.deltastreamer.keygen.timebased.timezone=GMT > --hoodie-conf > hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd > --hoodie-conf > hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS {code} > Hive Sync exception: > {code:java} > Exception in thread "main" org.apache.hudi.exception.HoodieException: Could > not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool > at > org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:58) > at > org.apache.hudi.utilities.deltastreamer.DeltaSync.runMetaSync(DeltaSync.java:719) > at > org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:637) > at > org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:337) > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:204) > at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:202) > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:571) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception > when hive syncing test_table > at > org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:145) > at > org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:56) > ... 19 more > Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync > partitions for table test_table > at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:341) > at > org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:232) > at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:154) > at > org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:142) > ... 20 more > Caused by: org.apache.hudi.hive.HoodieHiveSyncException: default.test_table > add partition failed > at > org.apache.hudi.hive.ddl.HMSDDLExecutor.addPartitionsToTable(HMSDDLExecutor.java:217) > at > org.apache.hudi.hive.HoodieHiveSyncClient.addPartitionsToTable(HoodieHiveSyncClient.java:107) > at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:324) > ... 23 more > Caused by: MetaException(message:Invalid partition key & values; keys > [createddate, ], values [2022, 10, 02, ]) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$add_partitions_req_result$add_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$add_partitions_req_result$add_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$add_partitions_req_result.read(ThriftHiveMetastore.java) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_add_partitions_req(ThriftHiveMetastore.java:1911) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.add_partitions_req(ThriftHiveMetastore.java:1898) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:625) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173) > at com.sun.proxy.$Proxy44.add_partitions(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2327) > at com.sun.proxy.$Proxy44.add_partitions(Unknown Source) > at > org.apache.hudi.hive.ddl.HMSDDLExecutor.addPartitionsToTable(HMSDDLExecutor.java:212) > ... 25 more{code} > Glue Sync exception: > {code:java} > Exception in thread "main" org.apache.hudi.exception.HoodieException: Could > not sync using the meta sync class > org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool > at > org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:58) > at > org.apache.hudi.utilities.deltastreamer.DeltaSync.runMetaSync(DeltaSync.java:719) > at > org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:637) > at > org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:337) > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:204) > at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:202) > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:571) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception > when hive syncing test_table > at > org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:145) > at > org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:56) > ... 19 more > Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync > partitions for table test_table > at > org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:341) > at > org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:232) > at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:154) > at > org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:142) > ... 20 more > Caused by: org.apache.hudi.aws.sync.HoodieGlueSyncException: Fail to add > partitions to default.test_table > at > org.apache.hudi.aws.sync.AWSGlueCatalogSyncClient.addPartitionsToTable(AWSGlueCatalogSyncClient.java:147) > at > org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:324) > ... 23 more > Caused by: > org.apache.hudi.com.amazonaws.services.glue.model.InvalidInputException: The > number of partition keys do not match the number of partition values > (Service: AWSGlue; Status Code: 400; Error Code: InvalidInputException; > Request ID: e8d9adf2-13c4-4589-bbec-c578a827749f; Proxy: null) > at > org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819) > at > org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403) > at > org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372) > at > org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) > at > org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) > at > org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) > at > org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) > at > org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) > at > org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) > at > org.apache.hudi.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) > at > org.apache.hudi.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) > at > org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.doInvoke(AWSGlueClient.java:10640) > at > org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:10607) > at > org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:10596) > at > org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.executeBatchCreatePartition(AWSGlueClient.java:259) > at > org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.batchCreatePartition(AWSGlueClient.java:228) > at > org.apache.hudi.aws.sync.AWSGlueCatalogSyncClient.addPartitionsToTable(AWSGlueCatalogSyncClient.java:139) > ... 24 more {code} > The exception is thrown because the partition path values for meta sync are > not properly extracted. > "hoodie.datasource.hive_sync.partition_extractor_class" determines the > partition extractor to use and in such a case, the > `MultiPartKeysValueExtractor` is inferred to be used. The root cause is > that, this extractor split the parts by slashes. If user specifies the > output dateformat which contains slashes, that is going to fail the > extraction. > The fix is to introduce a new partition extractor so that we treat the > partition as a whole when there is only a single partition column, instead of > relying on `MultiPartKeysValueExtractor`. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)