rohitmittapalli opened a new issue, #10203: URL: https://github.com/apache/hudi/issues/10203
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** Running a brand new HoodieStreamer on an empty folder, failing to create metadata table. This is running on a fresh build of the HudiUtilitiesBundle jar off of the tip of 0.14.0. **To Reproduce** Steps to reproduce the behavior: 1. Build Hudi utilities bundle 2. Start with empty source and empty target 3. Run the delta-streamer script on **Expected behavior** **Environment Description** * Hudi version : 0.14.0 * Spark version : 3.1.2 * Hive version : * Hadoop version : 3.2.0 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : yes **Additional context** Add any other context about the problem here. Running on Spark on K8s **Stacktrace** ``` Exception in thread "main" org.apache.hudi.utilities.ingestion.HoodieIngestionException: Ingestion service was shut down with exception. at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:67) at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) at org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:205) at org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:584) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Failed to instantiate Metadata table at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103) at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:65) ... 15 more Caused by: org.apache.hudi.exception.HoodieException: Failed to instantiate Metadata table at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.lambda$startService$1(HoodieStreamer.java:796) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.hudi.exception.HoodieException: Failed to instantiate Metadata table at org.apache.hudi.client.SparkRDDWriteClient.initializeMetadataTable(SparkRDDWriteClient.java:293) at org.apache.hudi.client.SparkRDDWriteClient.initMetadataTable(SparkRDDWriteClient.java:273) at org.apache.hudi.client.BaseHoodieWriteClient.doInitTable(BaseHoodieWriteClient.java:1256) at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1296) at org.apache.hudi.client.SparkRDDWriteClient.bulkInsert(SparkRDDWriteClient.java:223) at org.apache.hudi.client.SparkRDDWriteClient.bulkInsert(SparkRDDWriteClient.java:217) at org.apache.hudi.utilities.streamer.StreamSync.writeToSink(StreamSync.java:782) at org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:446) at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.lambda$startService$1(HoodieStreamer.java:757) ... 4 more ``` Deltastreamer script: ``` /opt/spark/bin/spark-submit \ --jars /hudi_14_base_jars/hudi-utilities-bundle-14.jar,/opt/spark/jars/hadoop-aws.jar,/opt/spark/jars/aws-java-sdk.jar,/opt/spark/jars/hadoop-azure.jar,/opt/spark/jars/wildfly-openssl.jar,/opt/spark/jars/AzureTokenGen.jar,/opt/spark/jars/guava-gcp.jar,/opt/spark/jars/gcs-connector.jar \ --master ${18} \ --deploy-mode client \ --name pts-deltastreamer-k8s-14 \ --conf spark.driver.port=8090 \ --conf spark.hadoop.fs.azure.account.auth.type.${26}.dfs.core.windows.net=Custom \ --conf spark.hadoop.fs.azure.account.oauth.provider.type.${26}.dfs.core.windows.net=applied.java.AzureTokenProvider \ --conf spark.hadoop.token=${27} \ --conf spark.hadoop.expiry=${28} \ --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \ --conf spark.hadoop.fs.s3a.connection.maximum=10000 \ --conf spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem \ --conf spark.driver.host=spark-deltastreamer-hudi-14-driver-headless \ --conf spark.scheduler.mode=FAIR \ --conf spark.kubernetes.namespace=${20} \ --conf spark.kubernetes.authenticate.submission.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \ --conf spark.kubernetes.authenticate.submission.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-k8s-driver-svcaccount \ --conf spark.kubernetes.node.selector.purpose=spark \ --conf spark.kubernetes.executor.podnameprefix=partitioned-pts-deltastreamer \ --conf spark.jars.ivy=/tmp/.ivy \ --conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \ --conf spark.kubernetes.container.image=quay.io/applied_dev/dp_spark_k8s:test-0.14 \ --conf spark.executor.instances=$1 \ --conf spark.driver.memory=${19} \ --conf spark.executor.memory=$2 \ --conf spark.kubernetes.driver.request.cores=${21} \ --conf spark.kubernetes.driver.limit.cores=${22} \ --conf spark.kubernetes.executor.request.cores=${23} \ --conf spark.kubernetes.executor.limit.cores=${24} \ --conf spark.kubernetes.driver.pod.name=${25} \ --class org.apache.hudi.utilities.streamer.HoodieStreamer /hudi_14_base_jars/hudi-utilities-bundle-14.jar \ --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \ --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider \ --target-table per_tick_stats_14 \ --table-type COPY_ON_WRITE \ --min-sync-interval-seconds 300 \ --source-limit ${17} \ --continuous \ --source-ordering-field $6 \ --target-base-path $4 \ --hoodie-conf hoodie.clustering.async.enabled=${10} \ --hoodie-conf hoodie.clustering.plan.strategy.sort.columns=$8 \ --hoodie-conf hoodie.clustering.plan.strategy.max.bytes.per.group=${12} \ --hoodie-conf hoodie.clustering.plan.strategy.max.num.groups=${13} \ --hoodie-conf hoodie.clustering.plan.strategy.small.file.limit=${14} \ --hoodie-conf hoodie.clustering.plan.strategy.target.file.max.bytes${15} \ --hoodie-conf hoodie.clustering.async.max.commits=${16} \ --hoodie-conf hoodie.streamer.source.dfs.root=$3 \ --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator \ --hoodie-conf hoodie.datasource.write.recordkey.field=$7 \ --hoodie-conf hoodie.datasource.write.precombine.field=$6 \ --hoodie-conf hoodie.metadata.enable=true \ --hoodie-conf hoodie.metadata.index.column.stats.enable=true \ --hoodie-conf hoodie.metadata.index.column.stats.column.list=$9 \ --hoodie-conf hoodie.bulkinsert.shuffle.parallelism=${11} \ --hoodie-conf hoodie.write.markers.type=DIRECT \ --hoodie-conf hoodie.datasource.write.partitionpath.field="" \ --hoodie-conf hoodie.streamer.schemaprovider.source.schema.file=$5 \ --hoodie-conf hoodie.streamer.schemaprovider.target.schema.file=$5 \ --op BULK_INSERT ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org