On 2019/04/04 19:48:39, Kabeer Ahmed <[email protected]> wrote: 
> Omkar - there might be various reasons to have duplicates eg: handle trades 
> in a given day from a single client, track visitor click data to the website 
> etc.
> 
> Rahul - If you can give more details about your requirements, then we can 
> come up with a solution.
> I have never used INSERT & BULK_INSERT at all and I am not sure if these 
> options (insert and bulk_insert) do allow user to specify the logic that you 
> are seeking. Without knowing your exact requirement, I can still give a 
> suggestion to look into the option of implementing your own 
> combineAndGetUpdateValue() logic.
> Lets say all your values for a particular key are strings. You could append 
> the string values to existing values and store them as:
> 
> key | Value
> Rahul | Nice
> // when there is another entry append the existing one with value with a 
> comma separator per say.
> 
> key | Value
> Rahul | Nice, Person
> When you retrieve the key values you could then decide to ship back to user 
> as you want - which is something you would know based on your requirement - 
> since your json is anyways having multiple ways to insert values for a key.
> 
> Feel free to reach out if you need help and I will help you as much as I can.
> On Apr 4 2019, at 6:35 pm, Omkar Joshi <[email protected]> wrote:
> > Hi Rahul,
> >
> > Thanks for trying out Hudi!!
> > Any reason why you need to have duplicates in HUDI dataset? Will you ever
> > be updating it later?
> >
> > Thanks,
> > Omkar
> >
> > On Thu, Apr 4, 2019 at 1:33 AM [email protected] <
> > [email protected]> wrote:
> >
> > > Dear All
> > > I am using cow table with INSERT/BULK_INSERT.
> > > I am loading the data from json files.
> > >
> > > If existing key in hudi dataset is loading again, then only new data with
> > > that key only showing. Can i able to show both data? (In INSERT)
> > >
> > > If same key is there in multiple times in a source json file, then only
> > > one key is getting loaded. Can i able to load duplicates keys from same
> > > file. (both insert/bulk_insert)
> > >
> > >
> > > Thanks & Regards
> > > Rahul
> >
> >
> 
> 
Dear Omar/Kabeer

 In one of my usecasetthink like  i don't want update at all. In json file, 
every time  i will pass a fixed value for a key field. Currently if i load data 
like this only 1 entry per file only load.  I don't want same key's values to 
be skipped while inserting. 

Thanks & Regards
Rahul

Reply via email to