ROOBALJINDAL opened a new issue, #10884: URL: https://github.com/apache/hudi/issues/10884
Issue: We have migrated from Hudi 0.13.0 to Hudi 0.14.0 and in this version, CDC events from Kafka upserts are not working. Table is created first time but afterwards, any new record added/updated into the sql table which pushes cdc event to kafka is not get updated in the hudi table. Is there any new configuration for hudi 0.14.0? We are running Aws EMR serverless: 6.15. We tried to enable debug level logs by providing following classification to serverless app which modified log4j properties to print hudi package logs but this also doesnt print. ``` { "classification": "spark-driver-log4j2", "properties": { "rootLogger.level": "debug", "logger.hudi.level": "debug", "logger.hudi.name": "org.apache.hudi" } }, { "classification": "spark-executor-log4j2", "properties": { "rootLogger.level": "debug", "logger.hudi.level": "debug", "logger.hudi.name": "org.apache.hudi" } } ``` Since it is serverless we can't ssh tunnel into node and see log4j property file and couldn't get hudi logs. ### **Configurations:** **### Spark job parameters:** ``` --class org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer --conf spark.sql.avro.datetimeRebaseModeInWrite=CORRECTED --conf spark.sql.avro.datetimeRebaseModeInRead=CORRECTED --conf spark.executor.instances=1 --conf spark.executor.memory=4g --conf spark.driver.memory=4g --conf spark.driver.cores=4 --conf spark.dynamicAllocation.initialExecutors=1 --props kafka-source.properties --config-folder table-config --payload-class com.myorg.MssqlDebeziumAvroPayload --source-class com.myorg.MssqlDebeziumSource --source-ordering-field _event_lsn --enable-sync --table-type COPY_ON_WRITE --source-limit 1000000000 --op UPSERT ``` **### kafka-source.properties:** ``` hoodie.streamer.ingestion.tablesToBeIngested=database1.student auto.offset.reset=earliest hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator hoodie.streamer.source.kafka.value.deserializer.class=io.confluent.kafka.serializers.KafkaAvroDeserializer hoodie.streamer.schemaprovider.registry.url= schema.registry.url=http://schema-registry-xxxxx:8080/apis/ccompat/v6 bootstrap.servers=b-1.xxxx.ikwdtc.c13.us-west-2.amazonaws.com:9096 hoodie.streamer.schemaprovider.registry.baseUrl=http://schema-registry-xxxxx:8080/apis/ccompat/v6/subjects/ hoodie.parquet.max.file.size=2147483648 hoodie.parquet.small.file.limit=1073741824 security.protocol=SASL_SSL sasl.mechanism=SCRAM-SHA-512 sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="XXXX" password="xxxxx"; ssl.truststore.location=/usr/lib/jvm/java/jre/lib/security/cacerts ssl.truststore.password=changeit ``` **### Table config properties:** ``` hoodie.datasource.hive_sync.database=database1 hoodie.datasource.hive_sync.support_timestamp=true hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled=true hoodie.datasource.write.recordkey.field=studentsid hoodie.datasource.write.partitionpath.field=studentcreationdate hoodie.datasource.hive_sync.table=student hoodie.datasource.write.schema.allow.auto.evolution.column.drop=true hoodie.datasource.hive_sync.partition_fields=studentcreationdate hoodie.keygen.timebased.timestamp.type=SCALAR hoodie.keygen.timebased.timestamp.scalar.time.unit=DAYS hoodie.keygen.timebased.input.dateformat=yyyy-MM-dd hoodie.keygen.timebased.output.dateformat=yyyy-MM-01 hoodie.keygen.timebased.timezone=GMT+8:00 hoodie.datasource.write.hive_style_partitioning=true hoodie.datasource.hive_sync.mode=hms hoodie.streamer.source.kafka.topic=dev.student hoodie.streamer.schemaprovider.registry.urlSuffix=-value/versions/latest ``` **Environment Description** * Hudi version : 0.14.0 * Spark version : 3.4.1 * Hive version : 3.1.3 * Hadoop version :3.3.6 * Storage (HDFS/S3/GCS..) : S3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org