samserpoosh commented on issue #9143: URL: https://github.com/apache/hudi/issues/9143#issuecomment-1639102139
FWIW, I'm seeing an identical issue on my end. The `before` is **not** populated correctly and all fields have **default** values instead. So as Sydney pointed out, this leads to having a **wrong** partition-key which makes DeltaStreamer unable to find the right partition and ultimately the right record to **deleted**. > or if there could be a workaround in Deltastreamer that allows it to delete the record without knowing what partition it is from. @sydneyhoran IIUC, when dealing with **partitioned datasets/Hudi Tables**, uniqueness is at a **partition level** as opposed to being global. Per Hudi [documentation](https://hudi.apache.org/docs/key_generation/): > In general, Hudi supports both partitioned and global indexes. For a dataset with partitioned index(which is most commonly used), each record is uniquely identified by a pair of record key and partition path. But for a dataset with global index, each record is uniquely identified by just the record key. There won't be any duplicate record keys across partitions. So I **think** since we're using partitioned datasets, global uniqueness does not exist and DeltaStreamer zero in on the partition and then record as opposed to global-lookup for the record. That's my understanding but we'll see Hudi team disagrees with this interpretation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org