Hi Sivaprakash, So I'm by no means an expert on this, but I think you might find what you're looking for here: https://hudi.apache.org/docs/concepts.html
I'm not sure I fully understand Step 2 you mentioned - I'm writing 50 records out of which only 10 records have been changed - does that mean that you updated 10 records from step 1? Or you're updating some of the other 40 records from step 2? Either way I guess, the key is all deltas will be written...it's after those records are written to disk that they are consolidated during the COMPACTION phase. I *BELIEVE* this is how it works. Take a look at COMPACTION under the timeline section here: https://hudi.apache.org/docs/concepts.html#timeline Hope that helps a bit. Allen On Thu, Jul 16, 2020 at 7:23 AM Sivaprakash <[email protected]> wrote: > This might be a basic question - I'm experimenting with Hudi (Pyspark). I > have used Insert/Upsert options to write delta into my data lake. However, > one is not clear to me > > Step 1:- I write 50 records > Step 2:- Im writing 50 records out of which only *10 records have been > changed* (I'm using upsert mode & tried with MERGE_ON_READ also > COPY_ON_WRITE) > Step 3: I was expecting only 10 records will be written but it writes whole > 50 records is this a normal behaviour? Which means do I need to determine > the delta myself and write them alone? > > Am I missing something? > -- *Allen Underwood* Principal Software Engineer Broadcom | Symantec Enterprise Division *Mobile*: 404.808.5926
smime.p7s
Description: S/MIME Cryptographic Signature
