Also an update from +Nam Bui <nam....@polidea.com> : "This commit [1] is up-to-date. So I walked through all of markdown files. Apart from Syntax changes between Jekyll & Hugo, if there were any differences in contents regarding to removed/added/modified, I would have double check the text with the current website. I can make sure that now 98-99% (probably 100% if I don’t miss any typos) of the contents are synced and 0 lost files. Thus, there should be no unexpected content changes which appears in this commit [1] anymore.
[1] https://github.com/apache/beam/pull/11554/commits/94e624fb43aa2218150fd3d97333c58d3d9ff653 " On Fri, May 8, 2020 at 9:57 AM Aizhamal Nurmamat kyzy <aizha...@apache.org> wrote: > I understand the difficulty, and this certainly comes with lessons learned > for future similar projects. > > To your questions Robert: > (1 and 2) I will commit to review the text in the resulting pages. I will > try and use some automation to extract visible text from each page and diff > it with the current state of the website. I can do this starting next week. > From some quick research, there seem to be tools that help with this > analysis ( > https://stackoverflow.com/questions/3286955/compare-two-websites-and-see-if-they-are-equal > ) > By remaining in this state, we hold others up from making changes, or we > increase the amount of work needed after merging to port over changes that > may be missed. If we move forward, new changes can be done on top of the > new website. > > (3) This makes sense. Brian, would you be able to spend some time to look > at the automation changes (build files and scripts) to ensure they look > fine? > > I would also like to write a post mortem to extract lessons learned and > avoid this situation in the future. > > > On Fri, May 8, 2020 at 9:44 AM Brian Hulette <bhule...@google.com> wrote: > >> I'm -0 on merging as-is. I have the same concerns as Robert and he's >> voiced them very well so I won't waste time re-airing them. >> >> (2) I spot checked the content, pulled out some common patterns, and >>> it mostly looks good, but there were also some issues (e.g. several >>> pages were replaced with the contents from entirely different pages). >>> I would be more comfortable if, say, a smoke test of comparing the old >>> and new sites, with html tags stripped and ignoring whitespace, >>> yielded what should be empty diffs. >>> >> >> Can you share any details about this analysis? >> >> +1 for verifying the old and new are the same by diffing the output. >> >> >>> (3) It'd be good to have someone give a stamp of approval on the >>> infrastructure changes, at least to validate that we're not going to >>> be taking on extra tech debt with regard to jenkins stability and >>> developer workflow. I see that Brian has at least looked at this some. >> >> >> My involvement so far was just recognizing a problem (creating root-owned >> files on jenkins workers) and helping to fix it. If there's anyone >> available who's familiar with the website infrastructure it would be great >> if they could take a look instead (if not I could probably acquaint myself >> enough to review). >> >> On Thu, May 7, 2020 at 11:57 PM Robert Bradshaw <rober...@google.com> >> wrote: >> >>> This is a tough situation. >>> >>> It would have been much better if this transition was structured in >>> such a way that the review was more manageable (e.g. the suggestion of >>> scripts, not mixing in voluminous unnecessary changes like whitespace, >>> and not updating content), and possibly even incrementally (e.g. the >>> new site would have been developed over multiple PRs in a subdomain or >>> subdirectory while being worked on). But hindsight is 20/20 and no >>> one, myself included, thought to bring this up when the original >>> migration was proposed, so this is more something to keep in mind for >>> the future. I also appreciate the efforts that have been made to clean >>> things up (e.g. preserving history) and address feedback. >>> >>> So, where do we go from here? My first thought is that I really don't >>> want to set a precedent that just because a PR "will require a large >>> effort" and in a state that if we don't "move forward and merge what >>> we have now" then "work done so far will be lost" means that we think >>> it's OK to forgo doing a proper review. >>> >>> On the other hand, there are some mitigating factors with this being >>> the website and not the code in that "bugs," though possibly >>> embarrassing, won't break production pipelines or data loss, and >>> though the source is technically part of the release, when we find >>> something to fix we can fix the live website much more quickly than go >>> through the whole release process and convince people to upgrade. (I >>> recognize accepting this argument is, to some degree at least, saying >>> that we don't care about the correctness of docs as much as so-called >>> "real" code, if we go there.) >>> >>> If we decide to go ahead and merge (and I would not object), there are >>> some things I would like to see. >>> >>> (1) I would like to understand what we would do afterwards to "review >>> the outcome, and ensure that all the content is there," and why it >>> can't be done before merging instead. (Is it because it'd take time >>> and we don't want to incorporate changes that are made to the website >>> in the meantime? I think that boat has sailed, but maybe we can avoid >>> making it worse...) >>> >>> (2) I spot checked the content, pulled out some common patterns, and >>> it mostly looks good, but there were also some issues (e.g. several >>> pages were replaced with the contents from entirely different pages). >>> I would be more comfortable if, say, a smoke test of comparing the old >>> and new sites, with html tags stripped and ignoring whitespace, >>> yielded what should be empty diffs. >>> >>> (3) It'd be good to have someone give a stamp of approval on the >>> infrastructure changes, at least to validate that we're not going to >>> be taking on extra tech debt with regard to jenkins stability and >>> developer workflow. I see that Brian has at least looked at this some. >>> >>> - Robert >>> >>> >>> On Thu, May 7, 2020 at 12:40 PM Aizhamal Nurmamat kyzy >>> <aizha...@apache.org> wrote: >>> > >>> > Thank you Ahmet. >>> > >>> > Robert/Brian, what do you think? >>> > >>> > The website staging and pre commit tests have passed [1]. If nobody >>> has objections, we could merge it soon. >>> > >>> > >>> > [1] https://github.com/apache/beam/pull/11554 >>> > >>> > On Thu, May 7, 2020 at 11:38 AM Ahmet Altay <al...@google.com> wrote: >>> >> >>> >> >>> >> >>> >> On Thu, May 7, 2020 at 10:50 AM Aizhamal Nurmamat kyzy < >>> aizha...@apache.org> wrote: >>> >>> >>> >>> Thanks for the writeup Ahmet. >>> >>> >>> >>> My bias is to move forward and merge the PR. After this, we'll >>> review the outcome, and ensure that all the content is there. Nam will help >>> us with that. >>> >>> The reason that I'd like to move forward and merge what we have now >>> - is that if we don't do that, the work done so far will be lost. >>> >>> We'll make sure to stage the website in its current state, and use >>> that as reference/archive to ensure all the content have been moved. >>> >>> >>> >>> Is this reasonable to everyone? >>> >> >>> >> >>> >> This is reasonable to me. I agree with your reasons. >>> >> >>> >> What do others think? >>> >> >>> >>> >>> >>> >>> >>> >>> >>> On Wed, May 6, 2020 at 7:07 PM Ahmet Altay <al...@google.com> wrote: >>> >>>> >>> >>>> >>> >>>> >>> >>>> On Wed, May 6, 2020 at 2:33 PM Aizhamal Nurmamat kyzy < >>> aizha...@apache.org> wrote: >>> >>>>>> >>> >>>>>> >>> >>>>>> > 1) Currently, the main blocker for merging is Staging Test >>> Failures. >>> >>>>>> >>> >>>>>> That and finishing the review. (Is someone tracking/coordinating >>> this?) >>> >>>>> >>> >>>>> >>> >>>>> I am coordinating the work on the failed tests, but I would need >>> other committer's help to perform the review. @Ahmet, could you help us >>> prioritize the review for this PR? >>> >>>> >>> >>>> >>> >>>> The problem is there are too many manual changes. Reviewing this >>> change in this form will require a large effort. I do not think I can >>> interrupt other projects to prioritize reviews on this PR. IMO, we have a >>> few options: >>> >>>> >>> >>>> - PR to be restructured in the format suggested in this thread. A >>> commit for infrastructure changes from Jekyll to hugo. A second commit for >>> a script that will convert the majority of the content. A third commit for >>> the execution of the script. And a fourth commit for the additional manual >>> content changes. If Nam can get to this form, people on this thread >>> myself/Robert/Pablo/Brian can review the changes. >>> >>>> - Another option is, we can accept that we already invested in this >>> transition and overall this is a good change, and merge the PR more or less >>> in its current form (with tests fixed and open comments addressed) even >>> though it has issues. And then overtime fix the issues we encounter. There >>> was already some amount of review and visual comparisons, we risk losing >>> some recent content changes but I am assuming this will not be much. If Nam >>> can commit to compare two sites after a merge, fixing the majority of the >>> delta, this might be a viable option. >>> >>>> >>> >>>> Another thing we can do, we can archive/store a read-only copy of >>> the current website in an "archive" url temporarily instead of completely >>> deleting it. It will give us a baseline for a while to go back to the old >>> content and move any missing data. (And maybe, someone can come up with an >>> innovative way to compare the textual content of both sites.) A note on the >>> stop world approach, I believe we are already failing on that with merge >>> conflicts showing up on the PR. It will be better for us to complete the >>> transition as soon as possible. Fixing after the initial merge might be a >>> simpler task, especially if we can archive the old site. >>> >>>> >>> >>>>> >>> >>>>> >>> >>>>>> >>> >>>>>> > Michal showed Nam how to handle the 1st test which was about >>> Apache License missing. >>> >>>>>> > >>> >>>>>> > However, the 2nd and 3rd tests looked like some kind of >>> permissions error on the Jenkins worker, not to be configured by code. For >>> more details based on Jenkin logs, the 2nd test failed because of >>> website/www/site/themes and the 3rd test failed because of >>> website/www/node_modules, they are both auto-generated files on build. Can >>> someone help Nam to look into this? >>> >>>>>> > >>> >>>>>> > RAT ("Run RAT PreCommit") — FAILURE >>> >>>>>> > Website_Stage_GCS ("Run Website_Stage_GCS PreCommit") — FAILURE >>> >>>>>> > Website_Stage_GCS ("Run Website_Stage_GCS PreCommit") — FAILURE >>> >>>>>> > >>> >>>>>> > 2) Are there any other blockers for merging? >>> @Ahmet/Robert/others please share if there are any other blockers. >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > [1] https://github.com/gohugoio/hugo/pull/4494 >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > On Wed, May 6, 2020 at 10:19 AM Robert Bradshaw < >>> rober...@google.com> wrote: >>> >>>>>> >> >>> >>>>>> >> On Mon, May 4, 2020 at 7:07 PM Ahmet Altay <al...@google.com> >>> wrote: >>> >>>>>> >> > >>> >>>>>> >> >> On Mon, May 4, 2020 at 6:30 PM Robert Bradshaw < >>> rober...@google.com> wrote: >>> >>>>>> >> >>> >>> >>>>>> >> >>> I took the massive commit and split it up into: >>> >>>>>> >> >>> >>> >>>>>> >> >>> (1) Infrastructure changes (basically everything outside of >>> >>>>>> >> >>> (website/www/site/content) >>> >>>>>> >> >>> (2) Sed script changes, and >>> >>>>>> >> >>> (3) Manual changes (everything not in (1) and (2)). >>> >>>>>> >> > >>> >>>>>> >> > >>> >>>>>> >> > Thank you Robert. This makes it much easier. What is the >>> source of the sed script? I am not sure why some of those lines are there. >>> It would be much easier for us to comment on the script source if it is >>> reviewable somewhere. >>> >>>>>> >> >>> >>>>>> >> I just gathered up common patterns as I was trying to go >>> through and >>> >>>>>> >> review the files... Mostly it was an exercise in finding a >>> compact >>> >>>>>> >> representation for the delta, not trying to be a perfect >>> conversion. >>> >>>>>> >> (I do think in retrospect, if we do something like this again, >>> it >>> >>>>>> >> would be preferable to commit a script that does the >>> auto-conversion >>> >>>>>> >> (maybe even with some patch files for manual changes) both for >>> ease of >>> >>>>>> >> reviewing and to avoid the stop-the-world situation we're in >>> now. (I'm >>> >>>>>> >> still worried that some changes will get lost in the shuffle.) >>> >>