I reviewed my original proposal and find the 'replaceBranch' is mainly for optimization of operations between paimon branch and flink jobs. It can be replaced with stop job -> merge branch -> restart job.
Relatively speaking, this is a low-frequency operation, we can remove it first, and consider adding it to the appropriate position when needed in the future, without adding additional IO. WDYT? 在 2024年6月26日星期三,Jingsong Li <[email protected]> 写道: > Hi Shammon, > > After some implementation, I discovered an issue: > > replace_branch incurs an expensive IO overhead for most operations in > the normal code path. For HDFS, it is a namenode access, and for > object storage, it is a separate billing. > > This is difficult to accept, and if replace_branch is not as useful, I > suggest removing this operation. > > If we remove replace_branch, can we consider changing the name of > merge_branch, such as changing it to fast_forward, which seems more > appropriate to its original meaning. > > Best, > Jingsong > > On Fri, Sep 29, 2023 at 2:25 AM Jingsong Li <[email protected]> > wrote: > > > > Thanks Shammon for driving. > > > > Sounds good to me to start a voting process. > > > > Best, > > Jingsong > > > > On Mon, Sep 25, 2023 at 7:14 PM Shammon FY <[email protected]> wrote: > > > > > > Hi all, > > > > > > Thanks for all the valuable feedback. If there‘s no more comments, I > will > > > start a vote for this PIP in the next 2 days. > > > > > > Best, > > > Shammon FY > > > > > > > > > On Thu, Sep 21, 2023 at 5:19 PM Shammon FY <[email protected]> wrote: > > > > > > > The feature `Replace Main With Branch` is used in duplicate data > > > > correction without modifying jobs. For example: > > > > > > > > 1. We can create branches with the same name for a series of paimon > tables > > > > 2. Re-submit all streaming jobs to read and write these branches for > tables > > > > 3. After the data in the branch is up to the main, we can stop all > the > > > > jobs which read and write main branch > > > > 4. Replace main branch with the created branch, we don't need to do > > > > anything with the jobs read and write the specified branch > > > > > > > > We cannot `Merge Branch To Main` here because the correct jobs will > still > > > > read and write the branches which will be completely independent of > main. > > > > > > > > Best, > > > > Shammon FY > > > > > > > > > > > > > > > > > > > > On Thu, Sep 21, 2023 at 12:21 AM Jingsong Li <[email protected] > > > > > > wrote: > > > > > > > >> Can you explain more about "Replace Main With Branch"? > > > >> > > > >> Does this need to be implemented? > > > >> > > > >> Best, > > > >> Jingsong > > > >> > > > >> On Tue, Sep 19, 2023 at 2:17 PM Shammon FY <[email protected]> > wrote: > > > >> > > > > >> > Hi ConradJam, > > > >> > > > > >> > How to handle data conflicts between the main branch and branches > is a > > > >> > complex problem. At present, we would like to replace data in > main with > > > >> > branch directly. You can think that during merge and replace > operations, > > > >> > the data after the specified tag in the main branch will be > deleted and > > > >> > then the data after the tag in the branch will be used in the > main. > > > >> > > > > >> > We can consider "merge" conflicting data in the future when we > meet > > > >> these > > > >> > requirements. > > > >> > > > > >> > Best, > > > >> > Shammon FY > > > >> > > > > >> > On Tue, Sep 19, 2023 at 10:50 AM ConradJam <[email protected]> > wrote: > > > >> > > > > >> > > +1 This feature looks a bit like Git’s branch management.If > this is > > > >> really > > > >> > > the case, how do we solve the data conflict when merging > branches? Do > > > >> we > > > >> > > need the user to specify that a certain branch data shall > prevail? > > > >> > > > > > >> > > Shammon FY <[email protected]> 于2023年9月18日周一 20:06写道: > > > >> > > > > > >> > > > Hi Jingsong, > > > >> > > > > > > >> > > > I have updated the PIP-9 to explain that the main `Snapshot`, > > > >> `Schema` > > > >> > > and > > > >> > > > `Tag` will exist in the base directory by default, just as > same as > > > >> the > > > >> > > > current directory structure. Thanks > > > >> > > > > > > >> > > > Best, > > > >> > > > Shammon FY > > > >> > > > > > > >> > > > > > > >> > > > On Fri, Sep 15, 2023 at 10:32 AM Shammon FY < > [email protected]> > > > >> wrote: > > > >> > > > > > > >> > > > > Hi Jingsong, > > > >> > > > > > > > >> > > > > Thanks for your suggestion, it sounds good to me. Currently > I only > > > >> > > > > mentioned it in the `Compatibility` section, I'll update > the PIP > > > >> to > > > >> > > > explain > > > >> > > > > this more clearly. > > > >> > > > > > > > >> > > > > Best, > > > >> > > > > Shammon FY > > > >> > > > > > > > >> > > > > On Wed, Sep 13, 2023 at 12:26 PM Jingsong Li < > > > >> [email protected]> > > > >> > > > > wrote: > > > >> > > > > > > > >> > > > >> Thanks Shammon for the proposal! > > > >> > > > >> > > > >> > > > >> It looks very good! > > > >> > > > >> > > > >> > > > >> I don't get the main branch file. > > > >> > > > >> > > > >> > > > >> Can we keep the main branch as it is? Just put snapshot/ > tag/ > > > >> schema/ > > > >> > > > >> in the table root directory. > > > >> > > > >> > > > >> > > > >> Best, > > > >> > > > >> Jingsong > > > >> > > > >> > > > >> > > > >> On Tue, Sep 12, 2023 at 3:55 PM Shammon FY < > [email protected]> > > > >> wrote: > > > >> > > > >> > > > > >> > > > >> > Hi devs, > > > >> > > > >> > > > > >> > > > >> > I would like to start a discussion about PIP-9: Support > Branch > > > >> [1]. > > > >> > > > >> Branch > > > >> > > > >> > in Paimon will help us deal with data correction without > > > >> copying all > > > >> > > > >> data > > > >> > > > >> > from original tables, and it can also enhance Tag for > Paimon > > > >> like > > > >> > > > >> > traditional Hive partition tables, providing data > correction > > > >> > > > >> capabilities > > > >> > > > >> > on the basis of Tag. > > > >> > > > >> > > > > >> > > > >> > Looking forward to your feedback, thanks! > > > >> > > > >> > > > > >> > > > >> > > > > >> > > > >> > [1] > > > >> > > > >> > > > > >> > > > >> > > > >> > > > > > > >> > > > > > >> https://cwiki.apache.org/confluence/display/PAIMON/PIP- > 9%3A+Support+Branch > > > >> > > > >> > > > > >> > > > >> > Best, > > > >> > > > >> > Shammon FY > > > >> > > > >> > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > -- > > > >> > > Best > > > >> > > > > > >> > > ConradJam > > > >> > > > > > >> > > > > >
