clp007 opened a new issue, #7960:
URL: https://github.com/apache/hudi/issues/7960

   
   **Describe the problem you faced**
   
   There is a problem when synchronizing the hudi table to bigquery. I'm not 
sure what the problem is and how to solve it;
   
   spark-submit --master yarn \
   --packages com.google.cloud:google-cloud-bigquery:2.10.4 \
   --jars /opt/hudi-gcp-bundle-0.12.1.jar \
   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
   /opt/hudi-utilities-bundle_2.12-0.12.1.jar \
   --target-base-path 
gs://transfer-table-data/incremental/test/bubble-pop-b01a0 \
   --target-table bubble-pop-b01a0 \
   --table-type COPY_ON_WRITE \
   --base-file-format PARQUET \
   --enable-sync \
   --sync-tool-classes org.apache.hudi.gcp.bigquery.BigQuerySyncTool \
   --hoodie-conf 
hoodie.deltastreamer.source.dfs.root=gs://transfer-table-data/incremental/test/bubble-pop-b01a0
 \
   --hoodie-conf hoodie.gcp.bigquery.sync.project_id=transferred \
   --hoodie-conf hoodie.gcp.bigquery.sync.dataset_name=temp_data \
   --hoodie-conf hoodie.gcp.bigquery.sync.dataset_location=us-central1 \
   --hoodie-conf hoodie.gcp.bigquery.sync.table_name=temp_bubble-pop \
   --hoodie-conf 
hoodie.gcp.bigquery.sync.base_path=gs://transfer-table-data/tmp/temp_bubble-pop/${NOW}
 \
   --hoodie-conf hoodie.gcp.bigquery.sync.partition_fields=event_date \
   --hoodie-conf 
hoodie.gcp.bigquery.sync.source_uri=gs://transfer-table-data/incremental/test/bubble-pop-b01a0/event_date=*
 \
   --hoodie-conf 
hoodie.gcp.bigquery.sync.source_uri_prefix=gs://transfer-table-data/incremental/test/bubble-pop-b01a0
 \
   --hoodie-conf hoodie.gcp.bigquery.sync.use_file_listing_from_metadata=true \
   --hoodie-conf hoodie.gcp.bigquery.sync.assume_date_partitioning=false \
   --hoodie-conf 
hoodie.datasource.write.recordkey.field=event_timestamp,event_name,user_pseudo_id,user_first_touch_timestamp,advertising_id
 \
   --hoodie-conf hoodie.datasource.write.partitionpath.field=event_date \
   --hoodie-conf hoodie.datasource.write.precombine.field=event_timestamp \
   --hoodie-conf hoodie.datasource.write.keygenerator.type=COMPLEX \
   --hoodie-conf hoodie.datasource.write.hive_style_partitioning=true \
   --hoodie-conf hoodie.datasource.write.drop.partition.columns=true \
   --hoodie-conf hoodie.partition.metafile.use.base.format=true \
   --hoodie-conf hoodie.metadata.enable=true \
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. An error occurred when I ran the above script
   
   **Environment Description**
   
   * Hudi version : hudi-spark3.2-bundle_2.12:0.12.1
   
   * Spark version :3.1
   
   * Storage (HDFS/S3/GCS..) :GCS
   
   * Running on Docker? (yes/no) :no
   
   **Additional context**
   
   dataproc spark
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   ERROR org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer: Got error 
ru
   nning delta sync once. Shutting down
   org.apache.hudi.exception.HoodieException: Please provide a valid schema 
provider class!
           at 
org.apache.hudi.utilities.sources.InputBatch.getSchemaProvider(InputBatch.java:56)
    
           at 
org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(S
   ourceFormatAdapter.java:64)
           at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:468)
 
           at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:401)
  
           at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:305)  
      
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaS
   treamer.java:204)
           at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.j
   ava:202)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.j
   ava:571)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(Spark
   Submit.scala:951)
           at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
           at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   23/02/15 08:15:04 INFO org.apache.hudi.utilities.deltastreamer.DeltaSync: 
Shutting down embedded 
   timeline server
   23/02/15 08:15:04 INFO 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer: Shut down del
   ta streamer
   23/02/15 08:15:04 INFO org.sparkproject.jetty.server.AbstractConnector: 
Stopped Spark@2b10ace9{HT
   TP/1.1, (http/1.1)}{0.0.0.0:8090}
   Exception in thread "main" org.apache.hudi.exception.HoodieException: Please 
provide a valid sche
   ma provider class!
           at 
org.apache.hudi.utilities.sources.InputBatch.getSchemaProvider(InputBatch.java:56)
    
           at 
org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(S
   ourceFormatAdapter.java:64)
           at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:468)
 
           at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:401)
  
           at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:305)  
      
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaS
   treamer.java:204)
           at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.j
   ava:202)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.j
   ava:571)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
           at java.lang.reflect.Method.invoke(Method.java:498)
           at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(Spark
   Submit.scala:951)
           at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
           at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to