ROOBALJINDAL opened a new issue, #10884:
URL: https://github.com/apache/hudi/issues/10884
Issue:
We have migrated from Hudi 0.13.0 to Hudi 0.14.0 and in this version, CDC
events from Kafka upserts are not working.
Table is created first time but afterwards, any new record added/updated
into the sql table which pushes cdc event to kafka is not get updated in the
hudi table. Is there any new configuration for hudi 0.14.0?
We are running Aws EMR serverless: 6.15. We tried to enable debug level logs
by providing following classification to serverless app which modified log4j
properties to print hudi package logs but this also doesnt print.
```
{
"classification": "spark-driver-log4j2",
"properties": {
"rootLogger.level": "debug",
"logger.hudi.level": "debug",
"logger.hudi.name": "org.apache.hudi"
}
},
{
"classification": "spark-executor-log4j2",
"properties": {
"rootLogger.level": "debug",
"logger.hudi.level": "debug",
"logger.hudi.name": "org.apache.hudi"
}
}
```
Since it is serverless we can't ssh tunnel into node and see log4j property
file and couldn't get hudi logs.
### **Configurations:**
**### Spark job parameters:**
```
--class org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer
--conf spark.sql.avro.datetimeRebaseModeInWrite=CORRECTED
--conf spark.sql.avro.datetimeRebaseModeInRead=CORRECTED
--conf spark.executor.instances=1
--conf spark.executor.memory=4g
--conf spark.driver.memory=4g
--conf spark.driver.cores=4
--conf spark.dynamicAllocation.initialExecutors=1
--props kafka-source.properties
--config-folder table-config
--payload-class com.myorg.MssqlDebeziumAvroPayload
--source-class com.myorg.MssqlDebeziumSource
--source-ordering-field _event_lsn
--enable-sync
--table-type COPY_ON_WRITE
--source-limit 10
--op UPSERT
```
**### kafka-source.properties:**
```
hoodie.streamer.ingestion.tablesToBeIngested=database1.student
auto.offset.reset=earliest
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator
hoodie.streamer.source.kafka.value.deserializer.class=io.confluent.kafka.serializers.KafkaAvroDeserializer
hoodie.streamer.schemaprovider.registry.url=
schema.registry.url=http://schema-registry-x:8080/apis/ccompat/v6
bootstrap.servers=b-1..ikwdtc.c13.us-west-2.amazonaws.com:9096
hoodie.streamer.schemaprovider.registry.baseUrl=http://schema-registry-x:8080/apis/ccompat/v6/subjects/
hoodie.parquet.max.file.size=2147483648
hoodie.parquet.small.file.limit=1073741824
security.protocol=SASL_SSL
sasl.mechanism=SCRAM-SHA-512
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule
required username="" password="x";
ssl.truststore.location=/usr/lib/jvm/java/jre/lib/security/cacerts
ssl.truststore.password=changeit
```
**### Table config properties:**
```
hoodie.datasource.hive_sync.database=database1
hoodie.datasource.hive_sync.support_timestamp=true
hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled=true
hoodie.datasource.write.recordkey.field=studentsid
hoodie.datasource.write.partitionpath.field=studentcreationdate
hoodie.datasource.hive_sync.table=student
hoodie.datasource.write.schema.allow.auto.evolution.column.drop=true
hoodie.datasource.hive_sync.partition_fields=studentcreationdate
hoodie.keygen.timebased.timestamp.type=SCALAR
hoodie.keygen.timebased.timestamp.scalar.time.unit=DAYS
hoodie.keygen.timebased.input.dateformat=-MM-dd
hoodie.keygen.timebased.output.dateformat=-MM-01
hoodie.keygen.timebased.timezone=GMT+8:00
hoodie.datasource.write.hive_style_partitioning=true
hoodie.datasource.hive_sync.mode=hms
hoodie.streamer.source.kafka.topic=dev.student
hoodie.streamer.schemaprovider.registry.urlSuffix=-value/versions/latest
```
**Environment Description**
* Hudi version : 0.14.0
* Spark version : 3.4.1
* Hive version : 3.1.3
* Hadoop version :3.3.6
* Storage (HDFS/S3/GCS..) : S3
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org