Hi, @Ahmet: Yeah, it's all clear to me. :) @Robert: Thanks for your ideas and also the script. It really helps me to serve my works.
Best regard! On Sat, May 9, 2020 at 2:10 AM Ahmet Altay <al...@google.com> wrote: > This sounds reasonable to me. Thank you. Nam, does it make sense to you? > > On Fri, May 8, 2020 at 11:53 AM Robert Bradshaw <rober...@google.com> > wrote: > >> I'd really like to not see this work go to waste, both the original >> revision, the further efforts Nam has done in making it more manageable to >> review, and the work put into reviewing this so far, so we can get the >> benefits of being on Hugo. How about this for a concrete proposal: >> >> (1) We get "standard" approval from one or more committers for the >> infrastructure changes, just as with any other PR. Brian has >> already started this, but if others could step up as well that'd be great. >> >> (2) Reviewers (and authors) typically count on (or request) sufficient >> automated test coverage to augment the fact that their eyeballs are >> fallible, which is something that is missing here (and given the size of >> the change not easily compensated for by a more detailed manual review). >> How about we use the script above (or similar) as an automated test to >> validate the website's contents haven't (materially) changed. I feel we've >> validated enough that the style looks good via spot checking (which is >> something that should work on all pages if it works on one). The diff >> between the current site and the newly generated site should be empty (it >> might already be [1]), or at least we should get a stamp of approval on the >> plain-text diff (which should be small), before merging. >> >> (3) To make things easier, everyone holds off on making any changes to >> the old site until a fixed future date (say, next Wednesday). Hopefully we >> can get it merged by then. If not, a condition for merging would be a >> commitment incorporating new changes after this date. >> >> Does this sound reasonable? >> >> - Robert >> >> >> >> [1] I'd be curious as to how small the diff already is, but my script >> relies on local directories with the generated HTML, which I don't have >> handy at the moment. >> >> >> >> On Fri, May 8, 2020 at 10:45 AM Robert Bradshaw <rober...@google.com> >> wrote: >> >>> Here's a script that we could run on the old and new sites that should >>> quickly catch any major issues but not get caught up in formatting minutia. >>> >>> >>> >>> On Fri, May 8, 2020 at 10:23 AM Robert Bradshaw <rober...@google.com> >>> wrote: >>> >>>> On Fri, May 8, 2020 at 9:58 AM Aizhamal Nurmamat kyzy < >>>> aizha...@apache.org> wrote: >>>> >>>>> I understand the difficulty, and this certainly comes with lessons >>>>> learned for future similar projects. >>>>> >>>>> To your questions Robert: >>>>> (1 and 2) I will commit to review the text in the resulting pages. I >>>>> will try and use some automation to extract visible text from each page >>>>> and >>>>> diff it with the current state of the website. I can do this starting next >>>>> week. From some quick research, there seem to be tools that help with this >>>>> analysis ( >>>>> https://stackoverflow.com/questions/3286955/compare-two-websites-and-see-if-they-are-equal >>>>> ) >>>>> >>>> >>>> At first glance it looks like these tools would give diffs that are >>>> *larger* than the 47K one we're struggling to review here. >>>> >>>> >>>>> By remaining in this state, we hold others up from making changes, or >>>>> we increase the amount of work needed after merging to port over changes >>>>> that may be missed. If we move forward, new changes can be done on top of >>>>> the new website. >>>>> >>>> >>>> I agree we don't want to hold others up from making changes. However, >>>> the amount of work to port changes over seems small in comparison to >>>> everything else that is being discussed here. (It also provides good >>>> incentives to reach the bar quickly and has the advantage of falling on the >>>> right people.) (3) will still take some time. >>>> >>>> If we go this route, we're lowering the bar for doc changes, but not >>>> removing it. >>>> >>>> >>>>> (3) This makes sense. Brian, would you be able to spend some time to >>>>> look at the automation changes (build files and scripts) to ensure they >>>>> look fine? >>>>> >>>>> I would also like to write a post mortem to extract lessons learned >>>>> and avoid this situation in the future. >>>>> >>>>> >>>>> On Fri, May 8, 2020 at 9:44 AM Brian Hulette <bhule...@google.com> >>>>> wrote: >>>>> >>>>>> I'm -0 on merging as-is. I have the same concerns as Robert and he's >>>>>> voiced them very well so I won't waste time re-airing them. >>>>>> >>>>>> (2) I spot checked the content, pulled out some common patterns, and >>>>>>> it mostly looks good, but there were also some issues (e.g. several >>>>>>> pages were replaced with the contents from entirely different pages). >>>>>>> I would be more comfortable if, say, a smoke test of comparing the >>>>>>> old >>>>>>> and new sites, with html tags stripped and ignoring whitespace, >>>>>>> yielded what should be empty diffs. >>>>>>> >>>>>> >>>>>> Can you share any details about this analysis? >>>>>> >>>>> >>>> It was basically paging through the diff, adding things to the sed >>>> script, and then looking at more diffs. >>>> >>>> >>>>> +1 for verifying the old and new are the same by diffing the output. >>>>>> >>>>>> >>>>>>> (3) It'd be good to have someone give a stamp of approval on the >>>>>>> infrastructure changes, at least to validate that we're not going to >>>>>>> be taking on extra tech debt with regard to jenkins stability and >>>>>>> developer workflow. I see that Brian has at least looked at this >>>>>>> some. >>>>>> >>>>>> >>>>>> My involvement so far was just recognizing a problem (creating >>>>>> root-owned files on jenkins workers) and helping to fix it. If there's >>>>>> anyone available who's familiar with the website infrastructure it would >>>>>> be >>>>>> great if they could take a look instead (if not I could probably acquaint >>>>>> myself enough to review). >>>>>> >>>>>> On Thu, May 7, 2020 at 11:57 PM Robert Bradshaw <rober...@google.com> >>>>>> wrote: >>>>>> >>>>>>> This is a tough situation. >>>>>>> >>>>>>> It would have been much better if this transition was structured in >>>>>>> such a way that the review was more manageable (e.g. the suggestion >>>>>>> of >>>>>>> scripts, not mixing in voluminous unnecessary changes like >>>>>>> whitespace, >>>>>>> and not updating content), and possibly even incrementally (e.g. the >>>>>>> new site would have been developed over multiple PRs in a subdomain >>>>>>> or >>>>>>> subdirectory while being worked on). But hindsight is 20/20 and no >>>>>>> one, myself included, thought to bring this up when the original >>>>>>> migration was proposed, so this is more something to keep in mind for >>>>>>> the future. I also appreciate the efforts that have been made to >>>>>>> clean >>>>>>> things up (e.g. preserving history) and address feedback. >>>>>>> >>>>>>> So, where do we go from here? My first thought is that I really don't >>>>>>> want to set a precedent that just because a PR "will require a large >>>>>>> effort" and in a state that if we don't "move forward and merge what >>>>>>> we have now" then "work done so far will be lost" means that we think >>>>>>> it's OK to forgo doing a proper review. >>>>>>> >>>>>>> On the other hand, there are some mitigating factors with this being >>>>>>> the website and not the code in that "bugs," though possibly >>>>>>> embarrassing, won't break production pipelines or data loss, and >>>>>>> though the source is technically part of the release, when we find >>>>>>> something to fix we can fix the live website much more quickly than >>>>>>> go >>>>>>> through the whole release process and convince people to upgrade. (I >>>>>>> recognize accepting this argument is, to some degree at least, saying >>>>>>> that we don't care about the correctness of docs as much as so-called >>>>>>> "real" code, if we go there.) >>>>>>> >>>>>>> If we decide to go ahead and merge (and I would not object), there >>>>>>> are >>>>>>> some things I would like to see. >>>>>>> >>>>>>> (1) I would like to understand what we would do afterwards to "review >>>>>>> the outcome, and ensure that all the content is there," and why it >>>>>>> can't be done before merging instead. (Is it because it'd take time >>>>>>> and we don't want to incorporate changes that are made to the website >>>>>>> in the meantime? I think that boat has sailed, but maybe we can avoid >>>>>>> making it worse...) >>>>>>> >>>>>>> (2) I spot checked the content, pulled out some common patterns, and >>>>>>> it mostly looks good, but there were also some issues (e.g. several >>>>>>> pages were replaced with the contents from entirely different pages). >>>>>>> I would be more comfortable if, say, a smoke test of comparing the >>>>>>> old >>>>>>> and new sites, with html tags stripped and ignoring whitespace, >>>>>>> yielded what should be empty diffs. >>>>>>> >>>>>>> (3) It'd be good to have someone give a stamp of approval on the >>>>>>> infrastructure changes, at least to validate that we're not going to >>>>>>> be taking on extra tech debt with regard to jenkins stability and >>>>>>> developer workflow. I see that Brian has at least looked at this >>>>>>> some. >>>>>>> >>>>>>> - Robert >>>>>>> >>>>>>> >>>>>>> On Thu, May 7, 2020 at 12:40 PM Aizhamal Nurmamat kyzy >>>>>>> <aizha...@apache.org> wrote: >>>>>>> > >>>>>>> > Thank you Ahmet. >>>>>>> > >>>>>>> > Robert/Brian, what do you think? >>>>>>> > >>>>>>> > The website staging and pre commit tests have passed [1]. If >>>>>>> nobody has objections, we could merge it soon. >>>>>>> > >>>>>>> > >>>>>>> > [1] https://github.com/apache/beam/pull/11554 >>>>>>> > >>>>>>> > On Thu, May 7, 2020 at 11:38 AM Ahmet Altay <al...@google.com> >>>>>>> wrote: >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> On Thu, May 7, 2020 at 10:50 AM Aizhamal Nurmamat kyzy < >>>>>>> aizha...@apache.org> wrote: >>>>>>> >>> >>>>>>> >>> Thanks for the writeup Ahmet. >>>>>>> >>> >>>>>>> >>> My bias is to move forward and merge the PR. After this, we'll >>>>>>> review the outcome, and ensure that all the content is there. Nam will >>>>>>> help >>>>>>> us with that. >>>>>>> >>> The reason that I'd like to move forward and merge what we have >>>>>>> now - is that if we don't do that, the work done so far will be lost. >>>>>>> >>> We'll make sure to stage the website in its current state, and >>>>>>> use that as reference/archive to ensure all the content have been moved. >>>>>>> >>> >>>>>>> >>> Is this reasonable to everyone? >>>>>>> >> >>>>>>> >> >>>>>>> >> This is reasonable to me. I agree with your reasons. >>>>>>> >> >>>>>>> >> What do others think? >>>>>>> >> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> On Wed, May 6, 2020 at 7:07 PM Ahmet Altay <al...@google.com> >>>>>>> wrote: >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> On Wed, May 6, 2020 at 2:33 PM Aizhamal Nurmamat kyzy < >>>>>>> aizha...@apache.org> wrote: >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> > 1) Currently, the main blocker for merging is Staging Test >>>>>>> Failures. >>>>>>> >>>>>> >>>>>>> >>>>>> That and finishing the review. (Is someone >>>>>>> tracking/coordinating this?) >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>>> I am coordinating the work on the failed tests, but I would >>>>>>> need other committer's help to perform the review. @Ahmet, could you >>>>>>> help >>>>>>> us prioritize the review for this PR? >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> The problem is there are too many manual changes. Reviewing >>>>>>> this change in this form will require a large effort. I do not think I >>>>>>> can >>>>>>> interrupt other projects to prioritize reviews on this PR. IMO, we have >>>>>>> a >>>>>>> few options: >>>>>>> >>>> >>>>>>> >>>> - PR to be restructured in the format suggested in this thread. >>>>>>> A commit for infrastructure changes from Jekyll to hugo. A second commit >>>>>>> for a script that will convert the majority of the content. A third >>>>>>> commit >>>>>>> for the execution of the script. And a fourth commit for the additional >>>>>>> manual content changes. If Nam can get to this form, people on this >>>>>>> thread >>>>>>> myself/Robert/Pablo/Brian can review the changes. >>>>>>> >>>> - Another option is, we can accept that we already invested in >>>>>>> this transition and overall this is a good change, and merge the PR >>>>>>> more or >>>>>>> less in its current form (with tests fixed and open comments addressed) >>>>>>> even though it has issues. And then overtime fix the issues we >>>>>>> encounter. >>>>>>> There was already some amount of review and visual comparisons, we risk >>>>>>> losing some recent content changes but I am assuming this will not be >>>>>>> much. >>>>>>> If Nam can commit to compare two sites after a merge, fixing the >>>>>>> majority >>>>>>> of the delta, this might be a viable option. >>>>>>> >>>> >>>>>>> >>>> Another thing we can do, we can archive/store a read-only copy >>>>>>> of the current website in an "archive" url temporarily instead of >>>>>>> completely deleting it. It will give us a baseline for a while to go >>>>>>> back >>>>>>> to the old content and move any missing data. (And maybe, someone can >>>>>>> come >>>>>>> up with an innovative way to compare the textual content of both >>>>>>> sites.) A >>>>>>> note on the stop world approach, I believe we are already failing on >>>>>>> that >>>>>>> with merge conflicts showing up on the PR. It will be better for us to >>>>>>> complete the transition as soon as possible. Fixing after the initial >>>>>>> merge >>>>>>> might be a simpler task, especially if we can archive the old site. >>>>>>> >>>> >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> > Michal showed Nam how to handle the 1st test which was >>>>>>> about Apache License missing. >>>>>>> >>>>>> > >>>>>>> >>>>>> > However, the 2nd and 3rd tests looked like some kind of >>>>>>> permissions error on the Jenkins worker, not to be configured by code. >>>>>>> For >>>>>>> more details based on Jenkin logs, the 2nd test failed because of >>>>>>> website/www/site/themes and the 3rd test failed because of >>>>>>> website/www/node_modules, they are both auto-generated files on build. >>>>>>> Can >>>>>>> someone help Nam to look into this? >>>>>>> >>>>>> > >>>>>>> >>>>>> > RAT ("Run RAT PreCommit") — FAILURE >>>>>>> >>>>>> > Website_Stage_GCS ("Run Website_Stage_GCS PreCommit") — >>>>>>> FAILURE >>>>>>> >>>>>> > Website_Stage_GCS ("Run Website_Stage_GCS PreCommit") — >>>>>>> FAILURE >>>>>>> >>>>>> > >>>>>>> >>>>>> > 2) Are there any other blockers for merging? >>>>>>> @Ahmet/Robert/others please share if there are any other blockers. >>>>>>> >>>>>> > >>>>>>> >>>>>> > >>>>>>> >>>>>> > [1] https://github.com/gohugoio/hugo/pull/4494 >>>>>>> >>>>>> > >>>>>>> >>>>>> > >>>>>>> >>>>>> > On Wed, May 6, 2020 at 10:19 AM Robert Bradshaw < >>>>>>> rober...@google.com> wrote: >>>>>>> >>>>>> >> >>>>>>> >>>>>> >> On Mon, May 4, 2020 at 7:07 PM Ahmet Altay < >>>>>>> al...@google.com> wrote: >>>>>>> >>>>>> >> > >>>>>>> >>>>>> >> >> On Mon, May 4, 2020 at 6:30 PM Robert Bradshaw < >>>>>>> rober...@google.com> wrote: >>>>>>> >>>>>> >> >>> >>>>>>> >>>>>> >> >>> I took the massive commit and split it up into: >>>>>>> >>>>>> >> >>> >>>>>>> >>>>>> >> >>> (1) Infrastructure changes (basically everything >>>>>>> outside of >>>>>>> >>>>>> >> >>> (website/www/site/content) >>>>>>> >>>>>> >> >>> (2) Sed script changes, and >>>>>>> >>>>>> >> >>> (3) Manual changes (everything not in (1) and (2)). >>>>>>> >>>>>> >> > >>>>>>> >>>>>> >> > >>>>>>> >>>>>> >> > Thank you Robert. This makes it much easier. What is the >>>>>>> source of the sed script? I am not sure why some of those lines are >>>>>>> there. >>>>>>> It would be much easier for us to comment on the script source if it is >>>>>>> reviewable somewhere. >>>>>>> >>>>>> >> >>>>>>> >>>>>> >> I just gathered up common patterns as I was trying to go >>>>>>> through and >>>>>>> >>>>>> >> review the files... Mostly it was an exercise in finding a >>>>>>> compact >>>>>>> >>>>>> >> representation for the delta, not trying to be a perfect >>>>>>> conversion. >>>>>>> >>>>>> >> (I do think in retrospect, if we do something like this >>>>>>> again, it >>>>>>> >>>>>> >> would be preferable to commit a script that does the >>>>>>> auto-conversion >>>>>>> >>>>>> >> (maybe even with some patch files for manual changes) both >>>>>>> for ease of >>>>>>> >>>>>> >> reviewing and to avoid the stop-the-world situation we're >>>>>>> in now. (I'm >>>>>>> >>>>>> >> still worried that some changes will get lost in the >>>>>>> shuffle.) >>>>>>> >>>>>>