Re: [I] [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key [hudi]

2024-01-31 Thread via GitHub


codope closed issue #10303: [SUPPORT] CoW: Hudi Upsert not working when there 
is a timestamp field in the composite key 
URL: https://github.com/apache/hudi/issues/10303


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10303:
URL: https://github.com/apache/hudi/issues/10303#issuecomment-1919013789

   @srinikandi Closing out this issue, Please reopen in case you still faces 
this issue after setting 
`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key [hudi]

2024-01-16 Thread via GitHub


ad1happy2go commented on issue #10303:
URL: https://github.com/apache/hudi/issues/10303#issuecomment-1895009159

   @srinikandi Sorry for the delay on this. 
   
   I was able to reproduce the issue with Hudi version 0.12.1 and 0.14.1. We 
have introduced the config 
"hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled", 
you can set it to True.
   
   ```
 public static final ConfigProperty 
KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED = ConfigProperty
 
.key("hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled")
 .defaultValue("false")
 .withDocumentation("When set to true, consistent value will be 
generated for a logical timestamp type column, "
 + "like timestamp-millis and timestamp-micros, irrespective of 
whether row-writer is enabled. Disabled by default so "
 + "as not to break the pipeline that deploy either fully 
row-writer path or non row-writer path. For example, "
 + "if it is kept disabled then record key of timestamp type with 
value `2016-12-29 09:54:00` will be written as timestamp "
 + "`2016-12-29 09:54:00.0` in row-writer path, while it will be 
written as long value `148302324000` in non row-writer path. "
 + "If enabled, then the timestamp value will be written in both 
the cases.");
   ```
   
   Reproducible Code which works when we set the config. - 
   
   ```
   from faker import Faker
   import pandas as pd
   from pyspark.sql import SparkSession
   import pyspark.sql.functions as F
   
   #..   Fake Data Generation 
...
   fake = Faker()
   data = [{"transactionId": fake.uuid4(), "EventTime": "2014-01-01 
23:00:01","storeNbr" : "1",
"FullName": fake.name(), "Address": fake.address(),
"CompanyName": fake.company(), "JobTitle": fake.job(),
"EmailAddress": fake.email(), "PhoneNumber": fake.phone_number(),
"RandomText": fake.sentence(), "City": fake.city(),
"State": "NYC", "Country": "US"} for _ in range(5)]
   pandas_df = pd.DataFrame(data)
   
   hoodi_configs = {
   "hoodie.insert.shuffle.parallelism": "1",
   "hoodie.upsert.shuffle.parallelism": "1",
   "hoodie.bulkinsert.shuffle.parallelism": "1",
   "hoodie.delete.shuffle.parallelism": "1",
   "hoodie.datasource.write.row.writer.enable": "true",
   "hoodie.datasource.write.keygenerator.class": 
"org.apache.hudi.keygen.ComplexKeyGenerator",
   "hoodie.datasource.write.recordkey.field": 
"transactionId,storeNbr,EventTime",
   "hoodie.datasource.write.precombine.field": "Country",
   "hoodie.datasource.write.partitionpath.field": "State",
   "hoodie.datasource.write.payload.class": 
"org.apache.hudi.common.model.DefaultHoodieRecordPayload",
   "hoodie.datasource.write.hive_style_partitioning": "true",
   "hoodie.combine.before.upsert": "true",
   "hoodie.table.name": "huditransaction",
   
"hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled": 
"false",
   }
   spark.sparkContext.setLogLevel("WARN")
   
   df = spark.createDataFrame(pandas_df).withColumn("EventTime", 
expr("cast(EventTime as timestamp)"))
   
df.write.format("hudi").options(**hoodi_configs).option("hoodie.datasource.write.operation","bulk_insert").mode("overwrite").save(PATH)
   
spark.read.options(**hoodi_configs).format("hudi").load(PATH).select("_hoodie_record_key").show(10,False)
   
df.withColumn("City",lit("updated_city")).write.format("hudi").options(**hoodi_configs).option("hoodie.datasource.write.operation","upsert").mode("append").save(PATH)
   
spark.read.options(**hoodi_configs).format("hudi").load(PATH).select("_hoodie_record_key").show(10,False)
   ```
   
   Let me know in case you need any more help on this. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key [hudi]

2023-12-11 Thread via GitHub


ad1happy2go commented on issue #10303:
URL: https://github.com/apache/hudi/issues/10303#issuecomment-1851290166

   @srinikandi I see a fix(https://github.com/apache/hudi/pull/4201) was tried 
but then it was reverted due to another issue,. Will look into it. Thanks for 
raising this again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key [hudi]

2023-12-11 Thread via GitHub


srinikandi commented on issue #10303:
URL: https://github.com/apache/hudi/issues/10303#issuecomment-1850939289

   Screen of the data showing the different behavior between bulk_insert and 
upsert on the same set of records.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key [hudi]

2023-12-11 Thread via GitHub


srinikandi commented on issue #10303:
URL: https://github.com/apache/hudi/issues/10303#issuecomment-1850932554

   Hudi Config used for Upsert
   [3:39 PM] Maindola, Amit (IT - St. Louis)
   {'encoding': 'utf-8', 'className': 'org.apache.hudi', 
'hoodie.datasource.hive_sync.enable': 'true', 
'hoodie.datasource.hive_sync.use_jdbc': 'false', 
'hoodie.datasource.hive_sync.support_timestamp': 'true', 'hoodie.table.name': 
'account_master', 'hoodie.datasource.hive_sync.table': 'account_master', 
'hoodie.datasource.write.recordkey.field': 'acct_id,eff_start_datetime', 
'hoodie.datasource.write.precombine.field': 'updated_datetime', 
'hoodie.datasource.hive_sync.database': 'account_core', 
'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.ComplexKeyGenerator', 
'hoodie.datasource.hive_sync.partition_extractor_class': 
'org.apache.hudi.hive.MultiPartKeysValueExtractor', 
'hoodie.datasource.write.partitionpath.field': '', 
'hoodie.datasource.hive_sync_mode': 'hms', 'hoodie.datasource.write.operation': 
'upsert'}


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key [hudi]

2023-12-11 Thread via GitHub


srinikandi opened a new issue, #10303:
URL: https://github.com/apache/hudi/issues/10303

   Hi we have been facing this issue with Hudi Upserts that are converting a 
timestamp field which is part of the Composite primary key.
   The bulk insert on the table works fine and storing the timestamp in a 
proper timestamp format. But when the same table has upsert operation (Type 2 
SCD), The new row inserted is having Timestamp value is getting converting into 
EPOCH for the __hoodied_record_key. The actual attribute in the table is having 
the data in proper timestamp format. This is breaking the type 2 SCD that we 
are trying to achieve as the subsequent updates are all being treated as new 
records.
   
   Steps to reproduce the behavior:
   
   1. Created A COW table using bulk_insert and using a timestamp filed as part 
of the complex primary key
   2. Performed Upserts on the same time and the primary record key value is 
having timestamp field value converted to INT
   
   We are using Glue with Hudi 0.12.1
   
   
   * Hudi version : 0.12.1
   
   * Spark version : 3.3
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No
   
   
   **Additional context**
   
   There was a issue opened about 2 years back and there was no resolution 
mentioned and the ticket was closed.
   https://github.com/apache/hudi/issues/3313
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key [hudi]

2023-12-11 Thread via GitHub


srinikandi commented on issue #3313:
URL: https://github.com/apache/hudi/issues/3313#issuecomment-1850901209

   This issue still exists for Upsert operation with Hudi 12.1. Is there a work 
around a fix for this. Bulk insert work fine, but when we upsert and if the 
timestamp is a part of the complex key, the timestamp is converted to INT.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org