PR to remove replace branch: https://github.com/apache/paimon/pull/3618
On Thu, Jun 27, 2024 at 1:21 PM Jingsong Li <[email protected]> wrote: > > Thanks Yong for your feedback. > > Sounds good~ > > Best, > Jingsong > > On Thu, Jun 27, 2024 at 12:24 PM Yong Fang <[email protected]> wrote: > > > > I reviewed my original proposal and find the 'replaceBranch' is mainly for > > optimization of operations between paimon branch and flink jobs. It can be > > replaced with stop job -> merge branch -> restart job. > > > > Relatively speaking, this is a low-frequency operation, we can remove it > > first, and consider adding it to the appropriate position when needed in > > the future, without adding additional IO. WDYT? > > > > 在 2024年6月26日星期三,Jingsong Li <[email protected]> 写道: > > > > > Hi Shammon, > > > > > > After some implementation, I discovered an issue: > > > > > > replace_branch incurs an expensive IO overhead for most operations in > > > the normal code path. For HDFS, it is a namenode access, and for > > > object storage, it is a separate billing. > > > > > > This is difficult to accept, and if replace_branch is not as useful, I > > > suggest removing this operation. > > > > > > If we remove replace_branch, can we consider changing the name of > > > merge_branch, such as changing it to fast_forward, which seems more > > > appropriate to its original meaning. > > > > > > Best, > > > Jingsong > > > > > > On Fri, Sep 29, 2023 at 2:25 AM Jingsong Li <[email protected]> > > > wrote: > > > > > > > > Thanks Shammon for driving. > > > > > > > > Sounds good to me to start a voting process. > > > > > > > > Best, > > > > Jingsong > > > > > > > > On Mon, Sep 25, 2023 at 7:14 PM Shammon FY <[email protected]> wrote: > > > > > > > > > > Hi all, > > > > > > > > > > Thanks for all the valuable feedback. If there‘s no more comments, I > > > will > > > > > start a vote for this PIP in the next 2 days. > > > > > > > > > > Best, > > > > > Shammon FY > > > > > > > > > > > > > > > On Thu, Sep 21, 2023 at 5:19 PM Shammon FY <[email protected]> wrote: > > > > > > > > > > > The feature `Replace Main With Branch` is used in duplicate data > > > > > > correction without modifying jobs. For example: > > > > > > > > > > > > 1. We can create branches with the same name for a series of paimon > > > tables > > > > > > 2. Re-submit all streaming jobs to read and write these branches for > > > tables > > > > > > 3. After the data in the branch is up to the main, we can stop all > > > the > > > > > > jobs which read and write main branch > > > > > > 4. Replace main branch with the created branch, we don't need to do > > > > > > anything with the jobs read and write the specified branch > > > > > > > > > > > > We cannot `Merge Branch To Main` here because the correct jobs will > > > still > > > > > > read and write the branches which will be completely independent of > > > main. > > > > > > > > > > > > Best, > > > > > > Shammon FY > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 21, 2023 at 12:21 AM Jingsong Li <[email protected] > > > > > > > > > > wrote: > > > > > > > > > > > >> Can you explain more about "Replace Main With Branch"? > > > > > >> > > > > > >> Does this need to be implemented? > > > > > >> > > > > > >> Best, > > > > > >> Jingsong > > > > > >> > > > > > >> On Tue, Sep 19, 2023 at 2:17 PM Shammon FY <[email protected]> > > > wrote: > > > > > >> > > > > > > >> > Hi ConradJam, > > > > > >> > > > > > > >> > How to handle data conflicts between the main branch and branches > > > is a > > > > > >> > complex problem. At present, we would like to replace data in > > > main with > > > > > >> > branch directly. You can think that during merge and replace > > > operations, > > > > > >> > the data after the specified tag in the main branch will be > > > deleted and > > > > > >> > then the data after the tag in the branch will be used in the > > > main. > > > > > >> > > > > > > >> > We can consider "merge" conflicting data in the future when we > > > meet > > > > > >> these > > > > > >> > requirements. > > > > > >> > > > > > > >> > Best, > > > > > >> > Shammon FY > > > > > >> > > > > > > >> > On Tue, Sep 19, 2023 at 10:50 AM ConradJam <[email protected]> > > > wrote: > > > > > >> > > > > > > >> > > +1 This feature looks a bit like Git’s branch management.If > > > this is > > > > > >> really > > > > > >> > > the case, how do we solve the data conflict when merging > > > branches? Do > > > > > >> we > > > > > >> > > need the user to specify that a certain branch data shall > > > prevail? > > > > > >> > > > > > > > >> > > Shammon FY <[email protected]> 于2023年9月18日周一 20:06写道: > > > > > >> > > > > > > > >> > > > Hi Jingsong, > > > > > >> > > > > > > > > >> > > > I have updated the PIP-9 to explain that the main `Snapshot`, > > > > > >> `Schema` > > > > > >> > > and > > > > > >> > > > `Tag` will exist in the base directory by default, just as > > > same as > > > > > >> the > > > > > >> > > > current directory structure. Thanks > > > > > >> > > > > > > > > >> > > > Best, > > > > > >> > > > Shammon FY > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > On Fri, Sep 15, 2023 at 10:32 AM Shammon FY < > > > [email protected]> > > > > > >> wrote: > > > > > >> > > > > > > > > >> > > > > Hi Jingsong, > > > > > >> > > > > > > > > > >> > > > > Thanks for your suggestion, it sounds good to me. Currently > > > I only > > > > > >> > > > > mentioned it in the `Compatibility` section, I'll update > > > the PIP > > > > > >> to > > > > > >> > > > explain > > > > > >> > > > > this more clearly. > > > > > >> > > > > > > > > > >> > > > > Best, > > > > > >> > > > > Shammon FY > > > > > >> > > > > > > > > > >> > > > > On Wed, Sep 13, 2023 at 12:26 PM Jingsong Li < > > > > > >> [email protected]> > > > > > >> > > > > wrote: > > > > > >> > > > > > > > > > >> > > > >> Thanks Shammon for the proposal! > > > > > >> > > > >> > > > > > >> > > > >> It looks very good! > > > > > >> > > > >> > > > > > >> > > > >> I don't get the main branch file. > > > > > >> > > > >> > > > > > >> > > > >> Can we keep the main branch as it is? Just put snapshot/ > > > tag/ > > > > > >> schema/ > > > > > >> > > > >> in the table root directory. > > > > > >> > > > >> > > > > > >> > > > >> Best, > > > > > >> > > > >> Jingsong > > > > > >> > > > >> > > > > > >> > > > >> On Tue, Sep 12, 2023 at 3:55 PM Shammon FY < > > > [email protected]> > > > > > >> wrote: > > > > > >> > > > >> > > > > > > >> > > > >> > Hi devs, > > > > > >> > > > >> > > > > > > >> > > > >> > I would like to start a discussion about PIP-9: Support > > > Branch > > > > > >> [1]. > > > > > >> > > > >> Branch > > > > > >> > > > >> > in Paimon will help us deal with data correction without > > > > > >> copying all > > > > > >> > > > >> data > > > > > >> > > > >> > from original tables, and it can also enhance Tag for > > > Paimon > > > > > >> like > > > > > >> > > > >> > traditional Hive partition tables, providing data > > > correction > > > > > >> > > > >> capabilities > > > > > >> > > > >> > on the basis of Tag. > > > > > >> > > > >> > > > > > > >> > > > >> > Looking forward to your feedback, thanks! > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > [1] > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > > > > > > >> > > > > > > > >> https://cwiki.apache.org/confluence/display/PAIMON/PIP- > > > 9%3A+Support+Branch > > > > > >> > > > >> > > > > > > >> > > > >> > Best, > > > > > >> > > > >> > Shammon FY > > > > > >> > > > >> > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > -- > > > > > >> > > Best > > > > > >> > > > > > > > >> > > ConradJam > > > > > >> > > > > > > > >> > > > > > > > > >
