[I] [SUPPORT] Hudi cdc upserts stopped working after migrating from hudi 13.1 to 14.0 [hudi]

via GitHub Tue, 19 Mar 2024 02:15:54 -0700


ROOBALJINDAL opened a new issue, #10884:
URL: https://github.com/apache/hudi/issues/10884


   Issue: 
   We have migrated from Hudi 0.13.0 to Hudi 0.14.0 and in this version, CDC 
events from Kafka upserts are not working.
   Table is created first time but afterwards, any new record added/updated 
into the sql table which pushes cdc event to kafka is not get updated in the 
hudi table. Is there any new configuration for hudi 0.14.0?
   
   We are running Aws EMR serverless: 6.15. We tried to enable debug level logs 
by providing following classification to serverless app which modified log4j 
properties to print hudi package logs but this also doesnt print.
   ```
   {
         "classification": "spark-driver-log4j2",
         "properties": {
           "rootLogger.level": "debug",
           "logger.hudi.level": "debug",
           "logger.hudi.name": "org.apache.hudi"
         }
       },
       {
         "classification": "spark-executor-log4j2",
         "properties": {
           "rootLogger.level": "debug",
           "logger.hudi.level": "debug",
           "logger.hudi.name": "org.apache.hudi"
         }
       }
   ```
   Since it is serverless we can't ssh tunnel into node and see log4j property 
file and couldn't get hudi logs.
   
   ### **Configurations:**
   
   **### Spark job parameters:**
   ```
   --class org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer
   --conf spark.sql.avro.datetimeRebaseModeInWrite=CORRECTED
   --conf spark.sql.avro.datetimeRebaseModeInRead=CORRECTED
   --conf spark.executor.instances=1
   --conf spark.executor.memory=4g
   --conf spark.driver.memory=4g
   --conf spark.driver.cores=4
   --conf spark.dynamicAllocation.initialExecutors=1
   --props kafka-source.properties
   --config-folder table-config
   --payload-class com.myorg.MssqlDebeziumAvroPayload
   --source-class com.myorg.MssqlDebeziumSource
   --source-ordering-field _event_lsn
   --enable-sync
   --table-type COPY_ON_WRITE
   --source-limit 1000000000
   --op UPSERT
   ```
   
   **### kafka-source.properties:**
   ```
   hoodie.streamer.ingestion.tablesToBeIngested=database1.student
    auto.offset.reset=earliest
    
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator
    
hoodie.streamer.source.kafka.value.deserializer.class=io.confluent.kafka.serializers.KafkaAvroDeserializer
    hoodie.streamer.schemaprovider.registry.url=
    schema.registry.url=http://schema-registry-xxxxx:8080/apis/ccompat/v6
    bootstrap.servers=b-1.xxxx.ikwdtc.c13.us-west-2.amazonaws.com:9096
    
hoodie.streamer.schemaprovider.registry.baseUrl=http://schema-registry-xxxxx:8080/apis/ccompat/v6/subjects/
    hoodie.parquet.max.file.size=2147483648
    hoodie.parquet.small.file.limit=1073741824
    security.protocol=SASL_SSL
    sasl.mechanism=SCRAM-SHA-512
    sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule 
required username="XXXX" password="xxxxx";
    ssl.truststore.location=/usr/lib/jvm/java/jre/lib/security/cacerts
    ssl.truststore.password=changeit
   ```
    
   **### Table config properties:**
   ```
   hoodie.datasource.hive_sync.database=database1
    hoodie.datasource.hive_sync.support_timestamp=true
    
hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled=true
   hoodie.datasource.write.recordkey.field=studentsid
    hoodie.datasource.write.partitionpath.field=studentcreationdate
    hoodie.datasource.hive_sync.table=student
    hoodie.datasource.write.schema.allow.auto.evolution.column.drop=true
    hoodie.datasource.hive_sync.partition_fields=studentcreationdate
    hoodie.keygen.timebased.timestamp.type=SCALAR
    hoodie.keygen.timebased.timestamp.scalar.time.unit=DAYS
    hoodie.keygen.timebased.input.dateformat=yyyy-MM-dd
    hoodie.keygen.timebased.output.dateformat=yyyy-MM-01
    hoodie.keygen.timebased.timezone=GMT+8:00
    hoodie.datasource.write.hive_style_partitioning=true
    hoodie.datasource.hive_sync.mode=hms
    hoodie.streamer.source.kafka.topic=dev.student
    hoodie.streamer.schemaprovider.registry.urlSuffix=-value/versions/latest
   ```
   
   **Environment Description**
   * Hudi version : 0.14.0
   
   * Spark version : 3.4.1
   
   * Hive version : 3.1.3
   
   * Hadoop version :3.3.6
   
   * Storage (HDFS/S3/GCS..) : S3
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] [SUPPORT] Hudi cdc upserts stopped working after migrating from hudi 13.1 to 14.0 [hudi]

Reply via email to