Omkar - there might be various reasons to have duplicates eg: handle trades in 
a given day from a single client, track visitor click data to the website etc.

Rahul - If you can give more details about your requirements, then we can come 
up with a solution.
I have never used INSERT & BULK_INSERT at all and I am not sure if these 
options (insert and bulk_insert) do allow user to specify the logic that you 
are seeking. Without knowing your exact requirement, I can still give a 
suggestion to look into the option of implementing your own 
combineAndGetUpdateValue() logic.
Lets say all your values for a particular key are strings. You could append the 
string values to existing values and store them as:

key | Value
Rahul | Nice
// when there is another entry append the existing one with value with a comma 
separator per say.

key | Value
Rahul | Nice, Person
When you retrieve the key values you could then decide to ship back to user as 
you want - which is something you would know based on your requirement - since 
your json is anyways having multiple ways to insert values for a key.

Feel free to reach out if you need help and I will help you as much as I can.
On Apr 4 2019, at 6:35 pm, Omkar Joshi <[email protected]> wrote:
> Hi Rahul,
>
> Thanks for trying out Hudi!!
> Any reason why you need to have duplicates in HUDI dataset? Will you ever
> be updating it later?
>
> Thanks,
> Omkar
>
> On Thu, Apr 4, 2019 at 1:33 AM [email protected] <
> [email protected]> wrote:
>
> > Dear All
> > I am using cow table with INSERT/BULK_INSERT.
> > I am loading the data from json files.
> >
> > If existing key in hudi dataset is loading again, then only new data with
> > that key only showing. Can i able to show both data? (In INSERT)
> >
> > If same key is there in multiple times in a source json file, then only
> > one key is getting loaded. Can i able to load duplicates keys from same
> > file. (both insert/bulk_insert)
> >
> >
> > Thanks & Regards
> > Rahul
>
>

Reply via email to