clp007 opened a new issue, #7960: URL: https://github.com/apache/hudi/issues/7960
**Describe the problem you faced** There is a problem when synchronizing the hudi table to bigquery. I'm not sure what the problem is and how to solve it; spark-submit --master yarn \ --packages com.google.cloud:google-cloud-bigquery:2.10.4 \ --jars /opt/hudi-gcp-bundle-0.12.1.jar \ --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \ /opt/hudi-utilities-bundle_2.12-0.12.1.jar \ --target-base-path gs://transfer-table-data/incremental/test/bubble-pop-b01a0 \ --target-table bubble-pop-b01a0 \ --table-type COPY_ON_WRITE \ --base-file-format PARQUET \ --enable-sync \ --sync-tool-classes org.apache.hudi.gcp.bigquery.BigQuerySyncTool \ --hoodie-conf hoodie.deltastreamer.source.dfs.root=gs://transfer-table-data/incremental/test/bubble-pop-b01a0 \ --hoodie-conf hoodie.gcp.bigquery.sync.project_id=transferred \ --hoodie-conf hoodie.gcp.bigquery.sync.dataset_name=temp_data \ --hoodie-conf hoodie.gcp.bigquery.sync.dataset_location=us-central1 \ --hoodie-conf hoodie.gcp.bigquery.sync.table_name=temp_bubble-pop \ --hoodie-conf hoodie.gcp.bigquery.sync.base_path=gs://transfer-table-data/tmp/temp_bubble-pop/${NOW} \ --hoodie-conf hoodie.gcp.bigquery.sync.partition_fields=event_date \ --hoodie-conf hoodie.gcp.bigquery.sync.source_uri=gs://transfer-table-data/incremental/test/bubble-pop-b01a0/event_date=* \ --hoodie-conf hoodie.gcp.bigquery.sync.source_uri_prefix=gs://transfer-table-data/incremental/test/bubble-pop-b01a0 \ --hoodie-conf hoodie.gcp.bigquery.sync.use_file_listing_from_metadata=true \ --hoodie-conf hoodie.gcp.bigquery.sync.assume_date_partitioning=false \ --hoodie-conf hoodie.datasource.write.recordkey.field=event_timestamp,event_name,user_pseudo_id,user_first_touch_timestamp,advertising_id \ --hoodie-conf hoodie.datasource.write.partitionpath.field=event_date \ --hoodie-conf hoodie.datasource.write.precombine.field=event_timestamp \ --hoodie-conf hoodie.datasource.write.keygenerator.type=COMPLEX \ --hoodie-conf hoodie.datasource.write.hive_style_partitioning=true \ --hoodie-conf hoodie.datasource.write.drop.partition.columns=true \ --hoodie-conf hoodie.partition.metafile.use.base.format=true \ --hoodie-conf hoodie.metadata.enable=true \ **To Reproduce** Steps to reproduce the behavior: 1. An error occurred when I ran the above script **Environment Description** * Hudi version : hudi-spark3.2-bundle_2.12:0.12.1 * Spark version :3.1 * Storage (HDFS/S3/GCS..) :GCS * Running on Docker? (yes/no) :no **Additional context** dataproc spark **Stacktrace** ```Add the stacktrace of the error.``` ERROR org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer: Got error ru nning delta sync once. Shutting down org.apache.hudi.exception.HoodieException: Please provide a valid schema provider class! at org.apache.hudi.utilities.sources.InputBatch.getSchemaProvider(InputBatch.java:56) at org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(S ourceFormatAdapter.java:64) at org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:468) at org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:401) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:305) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaS treamer.java:204) at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.j ava:202) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.j ava:571) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(Spark Submit.scala:951) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 23/02/15 08:15:04 INFO org.apache.hudi.utilities.deltastreamer.DeltaSync: Shutting down embedded timeline server 23/02/15 08:15:04 INFO org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer: Shut down del ta streamer 23/02/15 08:15:04 INFO org.sparkproject.jetty.server.AbstractConnector: Stopped Spark@2b10ace9{HT TP/1.1, (http/1.1)}{0.0.0.0:8090} Exception in thread "main" org.apache.hudi.exception.HoodieException: Please provide a valid sche ma provider class! at org.apache.hudi.utilities.sources.InputBatch.getSchemaProvider(InputBatch.java:56) at org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(S ourceFormatAdapter.java:64) at org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:468) at org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:401) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:305) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaS treamer.java:204) at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.j ava:202) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.j ava:571) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(Spark Submit.scala:951) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org