sure, I'm working on it, will add you as a co-author when create a pr

On Fri, Mar 25, 2022 at 1:17 AM Vinoth Chandar <vin...@apache.org> wrote:

> +1. Love to be a co-author on the RFC, if you are open to it.
>
> On Mon, Mar 21, 2022 at 12:31 PM 冯健 <fengjian...@gmail.com> wrote:
>
> > Hi team,
> >
> > The situation is Optimistic concurrency control(OCC) has some limitation
> >
> >    -
> >
> >    When conflicts do occur, they may waste massive resources during every
> >    attempt (lakehouse-concurrency-control-are-we-too-optimistic
> >    <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__hudi.apache.org_blog_2021_12_16_lakehouse-2Dconcurrency-2Dcontrol-2Dare-2Dwe-2Dtoo-2Doptimistic&d=DwIFaQ&c=R1GFtfTqKXCFH-lgEPXWwic6stQkW4U7uVq33mt-crw&r=bXAq09cDo2vOJ-2Uz9h3CslJmeCj9JMbo5X-gCHPF24&m=rz6Mo5568KcwmokXd967obpw0RNDcDJepfrUmf9KUxgfK14-uOfJSLb4l7xpCxqp&s=GFRt00qSBTRTWbGjUo-UBInLiU88zE_YbvHP0UO_geE&e=
> > >
> >    ).
> >    -
> >
> >    multiple writers may cause data duplicates when records with same new
> >    record-key arrives.multi-writer-guarantees
> >    <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__hudi.apache.org_docs_concurrency-5Fcontrol-23multi-2Dwriter-2Dguarantees&d=DwIFaQ&c=R1GFtfTqKXCFH-lgEPXWwic6stQkW4U7uVq33mt-crw&r=bXAq09cDo2vOJ-2Uz9h3CslJmeCj9JMbo5X-gCHPF24&m=rz6Mo5568KcwmokXd967obpw0RNDcDJepfrUmf9KUxgfK14-uOfJSLb4l7xpCxqp&s=H7a3yrvObNIz8WpuChSWN9X8fKpMslfTeiRJ29U3Tkg&e=
> >
> >
> > There is some background information, with OCC, we assume Multiple
> writers
> > won't write data to same FileID in most of time, if there is a FileId
> level
> > conflict, the commit will be rollbacked. and FileID level conflict can't
> > guarantee no duplicate if two records with same new record-key arrives in
> > multiple writers, since the mapping of key-bucket is not consistent with
> > bloom index.
> >
> > What I plan to do is support Lock-free concurrency control with a
> > non-duplicates guarantee in hudi(only for Merge-On-Read tables).
> >
> >    -
> >
> >    With canIndexLogfiles index , multiple writers ingesting data into
> >    Merge-on-read tables can only append data to delta logs. This is a
> >    lock-free process if we can make sure they don’t write data to the
> same
> > log
> >    file (plan to create multiple marker files to achieve this). And with
> > log
> >    merge API(preCombine logic in Payload class), data in log files can be
> > read
> >    properly
> >    -
> >
> >    Since hudi already has an index type like Bucket index which can map
> >    key-bucket in a consistent way.  Data duplicates can be eliminated
> >
> >
> > Thanks,
> > Jian Feng
> >
>


-- 
*Jian Feng,冯健*
Shopee | Engineer | Data Infrastructure

Reply via email to