Re: [I] [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key [hudi]
codope closed issue #10303: [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key URL: https://github.com/apache/hudi/issues/10303 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key [hudi]
ad1happy2go commented on issue #10303: URL: https://github.com/apache/hudi/issues/10303#issuecomment-1919013789 @srinikandi Closing out this issue, Please reopen in case you still faces this issue after setting `hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key [hudi]
ad1happy2go commented on issue #10303: URL: https://github.com/apache/hudi/issues/10303#issuecomment-1895009159 @srinikandi Sorry for the delay on this. I was able to reproduce the issue with Hudi version 0.12.1 and 0.14.1. We have introduced the config "hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled", you can set it to True. ``` public static final ConfigProperty KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED = ConfigProperty .key("hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled") .defaultValue("false") .withDocumentation("When set to true, consistent value will be generated for a logical timestamp type column, " + "like timestamp-millis and timestamp-micros, irrespective of whether row-writer is enabled. Disabled by default so " + "as not to break the pipeline that deploy either fully row-writer path or non row-writer path. For example, " + "if it is kept disabled then record key of timestamp type with value `2016-12-29 09:54:00` will be written as timestamp " + "`2016-12-29 09:54:00.0` in row-writer path, while it will be written as long value `148302324000` in non row-writer path. " + "If enabled, then the timestamp value will be written in both the cases."); ``` Reproducible Code which works when we set the config. - ``` from faker import Faker import pandas as pd from pyspark.sql import SparkSession import pyspark.sql.functions as F #.. Fake Data Generation ... fake = Faker() data = [{"transactionId": fake.uuid4(), "EventTime": "2014-01-01 23:00:01","storeNbr" : "1", "FullName": fake.name(), "Address": fake.address(), "CompanyName": fake.company(), "JobTitle": fake.job(), "EmailAddress": fake.email(), "PhoneNumber": fake.phone_number(), "RandomText": fake.sentence(), "City": fake.city(), "State": "NYC", "Country": "US"} for _ in range(5)] pandas_df = pd.DataFrame(data) hoodi_configs = { "hoodie.insert.shuffle.parallelism": "1", "hoodie.upsert.shuffle.parallelism": "1", "hoodie.bulkinsert.shuffle.parallelism": "1", "hoodie.delete.shuffle.parallelism": "1", "hoodie.datasource.write.row.writer.enable": "true", "hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator", "hoodie.datasource.write.recordkey.field": "transactionId,storeNbr,EventTime", "hoodie.datasource.write.precombine.field": "Country", "hoodie.datasource.write.partitionpath.field": "State", "hoodie.datasource.write.payload.class": "org.apache.hudi.common.model.DefaultHoodieRecordPayload", "hoodie.datasource.write.hive_style_partitioning": "true", "hoodie.combine.before.upsert": "true", "hoodie.table.name": "huditransaction", "hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled": "false", } spark.sparkContext.setLogLevel("WARN") df = spark.createDataFrame(pandas_df).withColumn("EventTime", expr("cast(EventTime as timestamp)")) df.write.format("hudi").options(**hoodi_configs).option("hoodie.datasource.write.operation","bulk_insert").mode("overwrite").save(PATH) spark.read.options(**hoodi_configs).format("hudi").load(PATH).select("_hoodie_record_key").show(10,False) df.withColumn("City",lit("updated_city")).write.format("hudi").options(**hoodi_configs).option("hoodie.datasource.write.operation","upsert").mode("append").save(PATH) spark.read.options(**hoodi_configs).format("hudi").load(PATH).select("_hoodie_record_key").show(10,False) ``` Let me know in case you need any more help on this. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key [hudi]
ad1happy2go commented on issue #10303: URL: https://github.com/apache/hudi/issues/10303#issuecomment-1851290166 @srinikandi I see a fix(https://github.com/apache/hudi/pull/4201) was tried but then it was reverted due to another issue,. Will look into it. Thanks for raising this again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key [hudi]
srinikandi commented on issue #10303: URL: https://github.com/apache/hudi/issues/10303#issuecomment-1850939289 Screen of the data showing the different behavior between bulk_insert and upsert on the same set of records. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key [hudi]
srinikandi commented on issue #10303: URL: https://github.com/apache/hudi/issues/10303#issuecomment-1850932554 Hudi Config used for Upsert [3:39 PM] Maindola, Amit (IT - St. Louis) {'encoding': 'utf-8', 'className': 'org.apache.hudi', 'hoodie.datasource.hive_sync.enable': 'true', 'hoodie.datasource.hive_sync.use_jdbc': 'false', 'hoodie.datasource.hive_sync.support_timestamp': 'true', 'hoodie.table.name': 'account_master', 'hoodie.datasource.hive_sync.table': 'account_master', 'hoodie.datasource.write.recordkey.field': 'acct_id,eff_start_datetime', 'hoodie.datasource.write.precombine.field': 'updated_datetime', 'hoodie.datasource.hive_sync.database': 'account_core', 'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.ComplexKeyGenerator', 'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor', 'hoodie.datasource.write.partitionpath.field': '', 'hoodie.datasource.hive_sync_mode': 'hms', 'hoodie.datasource.write.operation': 'upsert'} -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key [hudi]
srinikandi commented on issue #3313: URL: https://github.com/apache/hudi/issues/3313#issuecomment-1850901209 This issue still exists for Upsert operation with Hudi 12.1. Is there a work around a fix for this. Bulk insert work fine, but when we upsert and if the timestamp is a part of the complex key, the timestamp is converted to INT. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org