ddai-shippo opened a new issue, #6316:
URL: https://github.com/apache/hudi/issues/6316

   **Describe the problem you faced**
   
   New to Hudi so very well could be some configuration issue on my side. I'm 
trying to set up a continuous multi-table ingestion job. I can successfully 
ingest multiple tables when running without `--continuous` flag so properties 
seem set up correctly. When I add the `--continuous` flag, the job seems to be 
ingesting data for the first table in continuous mode and never proceeds to 
ingest data from other tables.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Set up multitable ingestion with `--continuous` flag
   2. Run hudi job
   3. Observe first table processing new incoming data, but other tables not 
progressing
   
   **Expected behavior**
   
   Multitable ingestion with `--continuous` flag on should process data 
continually cycling through all tables
   
   **Environment Description**
   Using AWS EMR Release 6.6
   
   * Hudi version : 0.10.1-amzn-0
   
   * Spark version : 3.2.0
   
   * Hive version : 3.1.2
   
   * Hadoop version : 3.2.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No
   
   
   **Additional context**
   
   ```
   spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer 
--packages 
org.apache.hudi:hudi-utilities-bundle_2.12:0.10.1,org.apache.spark:spark-avro_2.12:2.4.5
 --master yarn --deploy-mode cluster --conf 
spark.serializer=org.apache.spark.serializer.KryoSerializer --conf 
spark.sql.hive.convertMetastoreParquet=false 
/usr/lib/hudi/hudi-utilities-bundle_2.12-0.10.1-amzn-0.jar --table-type 
COPY_ON_WRITE --source-class org.apache.hudi.utilities.sources.ParquetDFSSource 
--payload-class org.apache.hudi.payload.AWSDmsAvroPayload 
--source-ordering-field ts --base-path-prefix s3://{hudi_root} --target-table 
dummy_table --transformer-class 
org.apache.hudi.utilities.transform.AWSDmsTransformer --props 
s3://{properties_file} --config-folder s3://{config_directory} --continuous 
--min-sync-interval-seconds 60 --source-limit 2147483648
   ```
   
   **Stacktrace**
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to