Re: [I] [SUPPORT] Hudi cdc upserts stopped working after migrating from hudi 13.1 to 14.0 [hudi]

2024-04-01 Thread via GitHub


ROOBALJINDAL closed issue #10884: [SUPPORT] Hudi cdc upserts stopped working 
after migrating from hudi 13.1 to 14.0
URL: https://github.com/apache/hudi/issues/10884


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hudi cdc upserts stopped working after migrating from hudi 13.1 to 14.0 [hudi]

2024-04-01 Thread via GitHub


ROOBALJINDAL commented on issue #10884:
URL: https://github.com/apache/hudi/issues/10884#issuecomment-2029401141

   I have found the issue. We were using custom MssqlDebeziumSource class as 
debezium source and in constructor we were using `HoodieStreamerMetrics` 
instead of `HoodieIngestionMetrics` (which is introduced in hudi 14.0)
   
   Once corrected the class, it started working. We can close this issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hudi cdc upserts stopped working after migrating from hudi 13.1 to 14.0 [hudi]

2024-03-19 Thread via GitHub


ad1happy2go commented on issue #10884:
URL: https://github.com/apache/hudi/issues/10884#issuecomment-2007239157

   Dont think it can be kafka version related issue as job is not failing. we 
need to know more logs  to debug this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hudi cdc upserts stopped working after migrating from hudi 13.1 to 14.0 [hudi]

2024-03-19 Thread via GitHub


ROOBALJINDAL commented on issue #10884:
URL: https://github.com/apache/hudi/issues/10884#issuecomment-2006856626

   @ad1happy2go need time to setup new cluster. Our aws msk kafka cluster uses 
kafka version=2.6.2, can you confirm is this fine or this can be an issue? Any 
specific supported version of kafka?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hudi cdc upserts stopped working after migrating from hudi 13.1 to 14.0 [hudi]

2024-03-19 Thread via GitHub


ad1happy2go commented on issue #10884:
URL: https://github.com/apache/hudi/issues/10884#issuecomment-2006696281

   @ROOBALJINDAL Is it possible to try the same on EMR so that you will get all 
the logs to look into this more. There is no known updates which can cause this 
for 0.14.0 upgrade.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hudi cdc upserts stopped working after migrating from hudi 13.1 to 14.0 [hudi]

2024-03-19 Thread via GitHub


ROOBALJINDAL commented on issue #10884:
URL: https://github.com/apache/hudi/issues/10884#issuecomment-2006449206

   @nsivabalan can you please check


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] Hudi cdc upserts stopped working after migrating from hudi 13.1 to 14.0 [hudi]

2024-03-19 Thread via GitHub


ROOBALJINDAL opened a new issue, #10884:
URL: https://github.com/apache/hudi/issues/10884

   Issue: 
   We have migrated from Hudi 0.13.0 to Hudi 0.14.0 and in this version, CDC 
events from Kafka upserts are not working.
   Table is created first time but afterwards, any new record added/updated 
into the sql table which pushes cdc event to kafka is not get updated in the 
hudi table. Is there any new configuration for hudi 0.14.0?
   
   We are running Aws EMR serverless: 6.15. We tried to enable debug level logs 
by providing following classification to serverless app which modified log4j 
properties to print hudi package logs but this also doesnt print.
   ```
   {
 "classification": "spark-driver-log4j2",
 "properties": {
   "rootLogger.level": "debug",
   "logger.hudi.level": "debug",
   "logger.hudi.name": "org.apache.hudi"
 }
   },
   {
 "classification": "spark-executor-log4j2",
 "properties": {
   "rootLogger.level": "debug",
   "logger.hudi.level": "debug",
   "logger.hudi.name": "org.apache.hudi"
 }
   }
   ```
   Since it is serverless we can't ssh tunnel into node and see log4j property 
file and couldn't get hudi logs.
   
   ### **Configurations:**
   
   **### Spark job parameters:**
   ```
   --class org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer
   --conf spark.sql.avro.datetimeRebaseModeInWrite=CORRECTED
   --conf spark.sql.avro.datetimeRebaseModeInRead=CORRECTED
   --conf spark.executor.instances=1
   --conf spark.executor.memory=4g
   --conf spark.driver.memory=4g
   --conf spark.driver.cores=4
   --conf spark.dynamicAllocation.initialExecutors=1
   --props kafka-source.properties
   --config-folder table-config
   --payload-class com.myorg.MssqlDebeziumAvroPayload
   --source-class com.myorg.MssqlDebeziumSource
   --source-ordering-field _event_lsn
   --enable-sync
   --table-type COPY_ON_WRITE
   --source-limit 10
   --op UPSERT
   ```
   
   **### kafka-source.properties:**
   ```
   hoodie.streamer.ingestion.tablesToBeIngested=database1.student
auto.offset.reset=earliest

hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator

hoodie.streamer.source.kafka.value.deserializer.class=io.confluent.kafka.serializers.KafkaAvroDeserializer
hoodie.streamer.schemaprovider.registry.url=
schema.registry.url=http://schema-registry-x:8080/apis/ccompat/v6
bootstrap.servers=b-1..ikwdtc.c13.us-west-2.amazonaws.com:9096

hoodie.streamer.schemaprovider.registry.baseUrl=http://schema-registry-x:8080/apis/ccompat/v6/subjects/
hoodie.parquet.max.file.size=2147483648
hoodie.parquet.small.file.limit=1073741824
security.protocol=SASL_SSL
sasl.mechanism=SCRAM-SHA-512
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule 
required username="" password="x";
ssl.truststore.location=/usr/lib/jvm/java/jre/lib/security/cacerts
ssl.truststore.password=changeit
   ```

   **### Table config properties:**
   ```
   hoodie.datasource.hive_sync.database=database1
hoodie.datasource.hive_sync.support_timestamp=true

hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled=true
   hoodie.datasource.write.recordkey.field=studentsid
hoodie.datasource.write.partitionpath.field=studentcreationdate
hoodie.datasource.hive_sync.table=student
hoodie.datasource.write.schema.allow.auto.evolution.column.drop=true
hoodie.datasource.hive_sync.partition_fields=studentcreationdate
hoodie.keygen.timebased.timestamp.type=SCALAR
hoodie.keygen.timebased.timestamp.scalar.time.unit=DAYS
hoodie.keygen.timebased.input.dateformat=-MM-dd
hoodie.keygen.timebased.output.dateformat=-MM-01
hoodie.keygen.timebased.timezone=GMT+8:00
hoodie.datasource.write.hive_style_partitioning=true
hoodie.datasource.hive_sync.mode=hms
hoodie.streamer.source.kafka.topic=dev.student
hoodie.streamer.schemaprovider.registry.urlSuffix=-value/versions/latest
   ```
   
   **Environment Description**
   * Hudi version : 0.14.0
   
   * Spark version : 3.4.1
   
   * Hive version : 3.1.3
   
   * Hadoop version :3.3.6
   
   * Storage (HDFS/S3/GCS..) : S3
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org