Re: Migrate Existing DataFrame to Hudi DataSet

Zhengxiang Pan Wed, 13 Nov 2019 14:12:08 -0800

Hi Sudha,
Yes, I did check, the number of distinct row_key matches.  My understanding
is that row_key is not the key to do de-dup. My row_key is not unique,
meaning several rows might have the same row_key, but pre-combine key for
sure is unique.


Thanks,
Pan

On Wed, Nov 13, 2019 at 2:54 PM Bhavani Sudha <[email protected]>
wrote:

> Hi Zhengxiang,
>
> regarding issue 2, were you able to confirm if the number of distinct
> row_key  in your original df and the distinct row_key in Hudi dataset
> matches?  If that matches, then we can dig into the precombine logic to see
> whats happening.
>
> Thanks,
> Sudha
>
> On Tue, Nov 12, 2019 at 9:42 AM Zhengxiang Pan <[email protected]> wrote:
>
> > Hi Balaji.V,
> > W.r.t issue 1), same issue occurs with spark 2.3.4.
> >
> > Pan
> >
>

Re: Migrate Existing DataFrame to Hudi DataSet

Reply via email to