Hi Sudha, Yes, I did check, the number of distinct row_key matches. My understanding is that row_key is not the key to do de-dup. My row_key is not unique, meaning several rows might have the same row_key, but pre-combine key for sure is unique.
Thanks, Pan On Wed, Nov 13, 2019 at 2:54 PM Bhavani Sudha <[email protected]> wrote: > Hi Zhengxiang, > > regarding issue 2, were you able to confirm if the number of distinct > row_key in your original df and the distinct row_key in Hudi dataset > matches? If that matches, then we can dig into the precombine logic to see > whats happening. > > Thanks, > Sudha > > On Tue, Nov 12, 2019 at 9:42 AM Zhengxiang Pan <[email protected]> wrote: > > > Hi Balaji.V, > > W.r.t issue 1), same issue occurs with spark 2.3.4. > > > > Pan > > >
