[ 
https://issues.apache.org/jira/browse/HUDI-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5689:
----------------------------
    Description: 
After enabling CDC, Deltastreamer fails to ingest data:
{code:java}
spark-submit 
--master yarn 
--jars 
/mnt1/hudi-jars/hudi-spark-bundle.jar,/mnt1/hudi-jars/hudi-utilities-slim-bundle.jar
 
--deploy-mode cluster 
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer 
--conf spark.sql.avro.datetimeRebaseModeInRead=CORRECTED 
--conf spark.sql.avro.datetimeRebaseModeInWrite=CORRECTED 
--conf spark.sql.parquet.datetimeRebaseModeInRead=CORRECTED 
--conf spark.sql.parquet.datetimeRebaseModeInWrite=CORRECTED 
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
/mnt1/hudi-jars/hudi-utilities-slim-bundle.jar 
--table-type COPY_ON_WRITE 
--source-ordering-field replicadmstimestamp 
--source-class org.apache.hudi.utilities.sources.ParquetDFSSource 
--target-base-path 
s3://rbi-datalake-dev/elbowpt3_hudi_2/staccbi_elbowpt/emr_new5 
--target-table emr 
--payload-class org.apache.hudi.common.model.AWSDmsAvroPayload 
--hoodie-conf 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator
 
--hoodie-conf hoodie.datasource.write.recordkey.field=_id 
--hoodie-conf hoodie.table.cdc.enabled=true 
--hoodie-conf hoodie.table.cdc.supplemental.logging.mode=cdc_data_before_after 
--hoodie-conf hoodie.datasource.write.partitionpath.field=package 
--hoodie-conf 
hoodie.deltastreamer.source.dfs.root=s3://rbi-datalake-dev/elbowpt3/staccbi_elbowpt/emr
 {code}
{code:java}
// code placeholder
{code}
 

> CDC fails in Deltastreamer
> --------------------------
>
>                 Key: HUDI-5689
>                 URL: https://issues.apache.org/jira/browse/HUDI-5689
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Ethan Guo
>            Assignee: Raymond Xu
>            Priority: Blocker
>             Fix For: 0.13.0
>
>
> After enabling CDC, Deltastreamer fails to ingest data:
> {code:java}
> spark-submit 
> --master yarn 
> --jars 
> /mnt1/hudi-jars/hudi-spark-bundle.jar,/mnt1/hudi-jars/hudi-utilities-slim-bundle.jar
>  
> --deploy-mode cluster 
> --conf spark.serializer=org.apache.spark.serializer.KryoSerializer 
> --conf spark.sql.avro.datetimeRebaseModeInRead=CORRECTED 
> --conf spark.sql.avro.datetimeRebaseModeInWrite=CORRECTED 
> --conf spark.sql.parquet.datetimeRebaseModeInRead=CORRECTED 
> --conf spark.sql.parquet.datetimeRebaseModeInWrite=CORRECTED 
> --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
> /mnt1/hudi-jars/hudi-utilities-slim-bundle.jar 
> --table-type COPY_ON_WRITE 
> --source-ordering-field replicadmstimestamp 
> --source-class org.apache.hudi.utilities.sources.ParquetDFSSource 
> --target-base-path 
> s3://rbi-datalake-dev/elbowpt3_hudi_2/staccbi_elbowpt/emr_new5 
> --target-table emr 
> --payload-class org.apache.hudi.common.model.AWSDmsAvroPayload 
> --hoodie-conf 
> hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator
>  
> --hoodie-conf hoodie.datasource.write.recordkey.field=_id 
> --hoodie-conf hoodie.table.cdc.enabled=true 
> --hoodie-conf 
> hoodie.table.cdc.supplemental.logging.mode=cdc_data_before_after 
> --hoodie-conf hoodie.datasource.write.partitionpath.field=package 
> --hoodie-conf 
> hoodie.deltastreamer.source.dfs.root=s3://rbi-datalake-dev/elbowpt3/staccbi_elbowpt/emr
>  {code}
> {code:java}
> // code placeholder
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to