Hi Sivaprakash,

So I'm by no means an expert on this, but I think you might find what
you're looking for here:
https://hudi.apache.org/docs/concepts.html

I'm not sure I fully understand Step 2 you mentioned - I'm writing 50
records out of which only 10 records have been changed - does that mean
that you updated 10 records from step 1?  Or you're updating some of the
other 40 records from step 2?

Either way I guess, the key is all deltas will be written...it's after
those records are written to disk that they are consolidated during the
COMPACTION phase.  I *BELIEVE* this is how it works.
Take a look at COMPACTION under the timeline section here:
https://hudi.apache.org/docs/concepts.html#timeline

Hope that helps a bit.

Allen

On Thu, Jul 16, 2020 at 7:23 AM Sivaprakash <[email protected]>
wrote:

> This might be a basic question - I'm experimenting with Hudi (Pyspark). I
> have used Insert/Upsert options to write delta into my data lake. However,
> one is not clear to me
>
> Step 1:- I write 50 records
> Step 2:- Im writing 50 records out of which only *10 records have been
> changed* (I'm using upsert mode & tried with MERGE_ON_READ also
> COPY_ON_WRITE)
> Step 3: I was expecting only 10 records will be written but it writes whole
> 50 records is this a normal behaviour? Which means do I need to determine
> the delta myself and write them alone?
>
> Am I missing something?
>


-- 
*Allen Underwood*
Principal Software Engineer
Broadcom | Symantec Enterprise Division
*Mobile*: 404.808.5926

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to