Omkar - there might be various reasons to have duplicates eg: handle trades in a given day from a single client, track visitor click data to the website etc.
Rahul - If you can give more details about your requirements, then we can come up with a solution. I have never used INSERT & BULK_INSERT at all and I am not sure if these options (insert and bulk_insert) do allow user to specify the logic that you are seeking. Without knowing your exact requirement, I can still give a suggestion to look into the option of implementing your own combineAndGetUpdateValue() logic. Lets say all your values for a particular key are strings. You could append the string values to existing values and store them as: key | Value Rahul | Nice // when there is another entry append the existing one with value with a comma separator per say. key | Value Rahul | Nice, Person When you retrieve the key values you could then decide to ship back to user as you want - which is something you would know based on your requirement - since your json is anyways having multiple ways to insert values for a key. Feel free to reach out if you need help and I will help you as much as I can. On Apr 4 2019, at 6:35 pm, Omkar Joshi <[email protected]> wrote: > Hi Rahul, > > Thanks for trying out Hudi!! > Any reason why you need to have duplicates in HUDI dataset? Will you ever > be updating it later? > > Thanks, > Omkar > > On Thu, Apr 4, 2019 at 1:33 AM [email protected] < > [email protected]> wrote: > > > Dear All > > I am using cow table with INSERT/BULK_INSERT. > > I am loading the data from json files. > > > > If existing key in hudi dataset is loading again, then only new data with > > that key only showing. Can i able to show both data? (In INSERT) > > > > If same key is there in multiple times in a source json file, then only > > one key is getting loaded. Can i able to load duplicates keys from same > > file. (both insert/bulk_insert) > > > > > > Thanks & Regards > > Rahul > >
