Hi,

@Ahmet: Yeah, it's all clear to me. :)
@Robert: Thanks for your ideas and also the script. It really helps me to
serve my works.

Best regard!

On Sat, May 9, 2020 at 2:10 AM Ahmet Altay <al...@google.com> wrote:

> This sounds reasonable to me. Thank you. Nam, does it make sense to you?
>
> On Fri, May 8, 2020 at 11:53 AM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> I'd really like to not see this work go to waste, both the original
>> revision, the further efforts Nam has done in making it more manageable to
>> review, and the work put into reviewing this so far, so we can get the
>> benefits of being on Hugo. How about this for a concrete proposal:
>>
>> (1) We get "standard" approval from one or more committers for the
>> infrastructure changes, just as with any other PR. Brian has
>> already started this, but if others could step up as well that'd be great.
>>
>> (2) Reviewers (and authors) typically count on (or request) sufficient
>> automated test coverage to augment the fact that their eyeballs are
>> fallible, which is something that is missing here (and given the size of
>> the change not easily compensated for by a more detailed manual review).
>> How about we use the script above (or similar) as an automated test to
>> validate the website's contents haven't (materially) changed. I feel we've
>> validated enough that the style looks good via spot checking (which is
>> something that should work on all pages if it works on one). The diff
>> between the current site and the newly generated site should be empty (it
>> might already be [1]), or at least we should get a stamp of approval on the
>> plain-text diff (which should be small), before merging.
>>
>> (3) To make things easier, everyone holds off on making any changes to
>> the old site until a fixed future date (say, next Wednesday). Hopefully we
>> can get it merged by then. If not, a condition for merging would be a
>> commitment incorporating new changes after this date.
>>
>> Does this sound reasonable?
>>
>> - Robert
>>
>>
>>
>> [1] I'd be curious as to how small the diff already is, but my script
>> relies on local directories with the generated HTML, which I don't have
>> handy at the moment.
>>
>>
>>
>> On Fri, May 8, 2020 at 10:45 AM Robert Bradshaw <rober...@google.com>
>> wrote:
>>
>>> Here's a script that we could run on the old and new sites that should
>>> quickly catch any major issues but not get caught up in formatting minutia.
>>>
>>>
>>>
>>> On Fri, May 8, 2020 at 10:23 AM Robert Bradshaw <rober...@google.com>
>>> wrote:
>>>
>>>> On Fri, May 8, 2020 at 9:58 AM Aizhamal Nurmamat kyzy <
>>>> aizha...@apache.org> wrote:
>>>>
>>>>> I understand the difficulty, and this certainly comes with lessons
>>>>> learned for future similar projects.
>>>>>
>>>>> To your questions Robert:
>>>>> (1 and 2) I will commit to review the text in the resulting pages. I
>>>>> will try and use some automation to extract visible text from each page 
>>>>> and
>>>>> diff it with the current state of the website. I can do this starting next
>>>>> week. From some quick research, there seem to be tools that help with this
>>>>> analysis (
>>>>> https://stackoverflow.com/questions/3286955/compare-two-websites-and-see-if-they-are-equal
>>>>> )
>>>>>
>>>>
>>>> At first glance it looks like these tools would give diffs that are
>>>> *larger* than the 47K one we're struggling to review here.
>>>>
>>>>
>>>>> By remaining in this state, we hold others up from making changes, or
>>>>> we increase the amount of work needed after merging to port over changes
>>>>> that may be missed. If we move forward, new changes can be done on top of
>>>>> the new website.
>>>>>
>>>>
>>>> I agree we don't want to hold others up from making changes. However,
>>>> the amount of work to port changes over seems small in comparison to
>>>> everything else that is being discussed here. (It also provides good
>>>> incentives to reach the bar quickly and has the advantage of falling on the
>>>> right people.) (3) will still take some time.
>>>>
>>>> If we go this route, we're lowering the bar for doc changes, but not
>>>> removing it.
>>>>
>>>>
>>>>> (3) This makes sense. Brian, would you be able to spend some time to
>>>>> look at the automation changes (build files and scripts) to ensure they
>>>>> look fine?
>>>>>
>>>>> I would also like to write a post mortem to extract lessons learned
>>>>> and avoid this situation in the future.
>>>>>
>>>>>
>>>>> On Fri, May 8, 2020 at 9:44 AM Brian Hulette <bhule...@google.com>
>>>>> wrote:
>>>>>
>>>>>> I'm -0 on merging as-is. I have the same concerns as Robert and he's
>>>>>> voiced them very well so I won't waste time re-airing them.
>>>>>>
>>>>>> (2) I spot checked the content, pulled out some common patterns, and
>>>>>>> it mostly looks good, but there were also some issues (e.g. several
>>>>>>> pages were replaced with the contents from entirely different pages).
>>>>>>> I would be more comfortable if, say, a smoke test of comparing the
>>>>>>> old
>>>>>>> and new sites, with html tags stripped and ignoring whitespace,
>>>>>>> yielded what should be empty diffs.
>>>>>>>
>>>>>>
>>>>>> Can you share any details about this analysis?
>>>>>>
>>>>>
>>>> It was basically paging through the diff, adding things to the sed
>>>> script, and then looking at more diffs.
>>>>
>>>>
>>>>> +1 for verifying the old and new are the same by diffing the output.
>>>>>>
>>>>>>
>>>>>>> (3) It'd be good to have someone give a stamp of approval on the
>>>>>>> infrastructure changes, at least to validate that we're not going to
>>>>>>> be taking on extra tech debt with regard to jenkins stability and
>>>>>>> developer workflow. I see that Brian has at least looked at this
>>>>>>> some.
>>>>>>
>>>>>>
>>>>>> My involvement so far was just recognizing a problem (creating
>>>>>> root-owned files on jenkins workers) and helping to fix it. If there's
>>>>>> anyone available who's familiar with the website infrastructure it would 
>>>>>> be
>>>>>> great if they could take a look instead (if not I could probably acquaint
>>>>>> myself enough to review).
>>>>>>
>>>>>> On Thu, May 7, 2020 at 11:57 PM Robert Bradshaw <rober...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> This is a tough situation.
>>>>>>>
>>>>>>> It would have been much better if this transition was structured in
>>>>>>> such a way that the review was more manageable (e.g. the suggestion
>>>>>>> of
>>>>>>> scripts, not mixing in voluminous unnecessary changes like
>>>>>>> whitespace,
>>>>>>> and not updating content), and possibly even incrementally (e.g. the
>>>>>>> new site would have been developed over multiple PRs in a subdomain
>>>>>>> or
>>>>>>> subdirectory while being worked on). But hindsight is 20/20 and no
>>>>>>> one, myself included, thought to bring this up when the original
>>>>>>> migration was proposed, so this is more something to keep in mind for
>>>>>>> the future. I also appreciate the efforts that have been made to
>>>>>>> clean
>>>>>>> things up (e.g. preserving history) and address feedback.
>>>>>>>
>>>>>>> So, where do we go from here? My first thought is that I really don't
>>>>>>> want to set a precedent that just because a PR "will require a large
>>>>>>> effort" and in a state that if we don't "move forward and merge what
>>>>>>> we have now" then "work done so far will be lost" means that we think
>>>>>>> it's OK to forgo doing a proper review.
>>>>>>>
>>>>>>> On the other hand, there are some mitigating factors with this being
>>>>>>> the website and not the code in that "bugs," though possibly
>>>>>>> embarrassing, won't break production pipelines or data loss, and
>>>>>>> though the source is technically part of the release, when we find
>>>>>>> something to fix we can fix the live website much more quickly than
>>>>>>> go
>>>>>>> through the whole release process and convince people to upgrade. (I
>>>>>>> recognize accepting this argument is, to some degree at least, saying
>>>>>>> that we don't care about the correctness of docs as much as so-called
>>>>>>> "real" code, if we go there.)
>>>>>>>
>>>>>>> If we decide to go ahead and merge (and I would not object), there
>>>>>>> are
>>>>>>> some things I would like to see.
>>>>>>>
>>>>>>> (1) I would like to understand what we would do afterwards to "review
>>>>>>> the outcome, and ensure that all the content is there," and why it
>>>>>>> can't be done before merging instead. (Is it because it'd take time
>>>>>>> and we don't want to incorporate changes that are made to the website
>>>>>>> in the meantime? I think that boat has sailed, but maybe we can avoid
>>>>>>> making it worse...)
>>>>>>>
>>>>>>> (2) I spot checked the content, pulled out some common patterns, and
>>>>>>> it mostly looks good, but there were also some issues (e.g. several
>>>>>>> pages were replaced with the contents from entirely different pages).
>>>>>>> I would be more comfortable if, say, a smoke test of comparing the
>>>>>>> old
>>>>>>> and new sites, with html tags stripped and ignoring whitespace,
>>>>>>> yielded what should be empty diffs.
>>>>>>>
>>>>>>> (3) It'd be good to have someone give a stamp of approval on the
>>>>>>> infrastructure changes, at least to validate that we're not going to
>>>>>>> be taking on extra tech debt with regard to jenkins stability and
>>>>>>> developer workflow. I see that Brian has at least looked at this
>>>>>>> some.
>>>>>>>
>>>>>>> - Robert
>>>>>>>
>>>>>>>
>>>>>>> On Thu, May 7, 2020 at 12:40 PM Aizhamal Nurmamat kyzy
>>>>>>> <aizha...@apache.org> wrote:
>>>>>>> >
>>>>>>> > Thank you Ahmet.
>>>>>>> >
>>>>>>> > Robert/Brian, what do you think?
>>>>>>> >
>>>>>>> > The website staging and pre commit tests have passed [1]. If
>>>>>>> nobody has objections, we could merge it soon.
>>>>>>> >
>>>>>>> >
>>>>>>> > [1] https://github.com/apache/beam/pull/11554
>>>>>>> >
>>>>>>> > On Thu, May 7, 2020 at 11:38 AM Ahmet Altay <al...@google.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Thu, May 7, 2020 at 10:50 AM Aizhamal Nurmamat kyzy <
>>>>>>> aizha...@apache.org> wrote:
>>>>>>> >>>
>>>>>>> >>> Thanks for the writeup Ahmet.
>>>>>>> >>>
>>>>>>> >>> My bias is to move forward and merge the PR. After this, we'll
>>>>>>> review the outcome, and ensure that all the content is there. Nam will 
>>>>>>> help
>>>>>>> us with that.
>>>>>>> >>> The reason that I'd like to move forward and merge what we have
>>>>>>> now - is that if we don't do that, the work done so far will be lost.
>>>>>>> >>> We'll make sure to stage the website in its current state, and
>>>>>>> use that as reference/archive to ensure all the content have been moved.
>>>>>>> >>>
>>>>>>> >>> Is this reasonable to everyone?
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> This is reasonable to me. I agree with your reasons.
>>>>>>> >>
>>>>>>> >> What do others think?
>>>>>>> >>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> On Wed, May 6, 2020 at 7:07 PM Ahmet Altay <al...@google.com>
>>>>>>> wrote:
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> On Wed, May 6, 2020 at 2:33 PM Aizhamal Nurmamat kyzy <
>>>>>>> aizha...@apache.org> wrote:
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> > 1) Currently, the main blocker for merging is Staging Test
>>>>>>> Failures.
>>>>>>> >>>>>>
>>>>>>> >>>>>> That and finishing the review. (Is someone
>>>>>>> tracking/coordinating this?)
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> I am coordinating the work on the failed tests, but I would
>>>>>>> need other committer's help to perform the review. @Ahmet, could you 
>>>>>>> help
>>>>>>> us prioritize the review for this PR?
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> The problem is there are too many manual changes. Reviewing
>>>>>>> this change in this form will require a large effort. I do not think I 
>>>>>>> can
>>>>>>> interrupt other projects to prioritize reviews on this PR. IMO, we have 
>>>>>>> a
>>>>>>> few options:
>>>>>>> >>>>
>>>>>>> >>>> - PR to be restructured in the format suggested in this thread.
>>>>>>> A commit for infrastructure changes from Jekyll to hugo. A second commit
>>>>>>> for a script that will convert the majority of the content. A third 
>>>>>>> commit
>>>>>>> for the execution of the script. And a fourth commit for the additional
>>>>>>> manual content changes. If Nam can get to this form, people on this 
>>>>>>> thread
>>>>>>> myself/Robert/Pablo/Brian can review the changes.
>>>>>>> >>>> - Another option is, we can accept that we already invested in
>>>>>>> this transition and overall this is a good change, and merge the PR 
>>>>>>> more or
>>>>>>> less in its current form (with tests fixed and open comments addressed)
>>>>>>> even though it has issues. And then overtime fix the issues we 
>>>>>>> encounter.
>>>>>>> There was already some amount of review and visual comparisons, we risk
>>>>>>> losing some recent content changes but I am assuming this will not be 
>>>>>>> much.
>>>>>>> If Nam can commit to compare two sites after a merge, fixing the 
>>>>>>> majority
>>>>>>> of the delta, this might be a viable option.
>>>>>>> >>>>
>>>>>>> >>>> Another thing we can do, we can archive/store a read-only copy
>>>>>>> of the current website in an "archive" url temporarily instead of
>>>>>>> completely deleting it. It will give us a baseline for a while to go 
>>>>>>> back
>>>>>>> to the old content and move any missing data. (And maybe, someone can 
>>>>>>> come
>>>>>>> up with an innovative way to compare the textual content of both 
>>>>>>> sites.) A
>>>>>>> note on the stop world approach, I believe we are already failing on 
>>>>>>> that
>>>>>>> with merge conflicts showing up on the PR. It will be better for us to
>>>>>>> complete the transition as soon as possible. Fixing after the initial 
>>>>>>> merge
>>>>>>> might be a simpler task, especially if we can archive the old site.
>>>>>>> >>>>
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> > Michal showed Nam how to handle the 1st test which was
>>>>>>> about Apache License missing.
>>>>>>> >>>>>> >
>>>>>>> >>>>>> > However, the 2nd and 3rd tests looked like some kind of
>>>>>>> permissions error on the Jenkins worker, not to be configured by code. 
>>>>>>> For
>>>>>>> more details based on Jenkin logs, the 2nd test failed because of
>>>>>>> website/www/site/themes and the 3rd test failed because of
>>>>>>> website/www/node_modules, they are both auto-generated files on build. 
>>>>>>> Can
>>>>>>> someone help Nam to look into this?
>>>>>>> >>>>>> >
>>>>>>> >>>>>> > RAT ("Run RAT PreCommit") — FAILURE
>>>>>>> >>>>>> > Website_Stage_GCS ("Run Website_Stage_GCS PreCommit") —
>>>>>>> FAILURE
>>>>>>> >>>>>> > Website_Stage_GCS ("Run Website_Stage_GCS PreCommit") —
>>>>>>> FAILURE
>>>>>>> >>>>>> >
>>>>>>> >>>>>> > 2) Are there any other blockers for merging?
>>>>>>> @Ahmet/Robert/others please share if there are any other blockers.
>>>>>>> >>>>>> >
>>>>>>> >>>>>> >
>>>>>>> >>>>>> > [1] https://github.com/gohugoio/hugo/pull/4494
>>>>>>> >>>>>> >
>>>>>>> >>>>>> >
>>>>>>> >>>>>> > On Wed, May 6, 2020 at 10:19 AM Robert Bradshaw <
>>>>>>> rober...@google.com> wrote:
>>>>>>> >>>>>> >>
>>>>>>> >>>>>> >> On Mon, May 4, 2020 at 7:07 PM Ahmet Altay <
>>>>>>> al...@google.com> wrote:
>>>>>>> >>>>>> >> >
>>>>>>> >>>>>> >> >> On Mon, May 4, 2020 at 6:30 PM Robert Bradshaw <
>>>>>>> rober...@google.com> wrote:
>>>>>>> >>>>>> >> >>>
>>>>>>> >>>>>> >> >>> I took the massive commit and split it up into:
>>>>>>> >>>>>> >> >>>
>>>>>>> >>>>>> >> >>> (1) Infrastructure changes (basically everything
>>>>>>> outside of
>>>>>>> >>>>>> >> >>> (website/www/site/content)
>>>>>>> >>>>>> >> >>> (2) Sed script changes, and
>>>>>>> >>>>>> >> >>> (3) Manual changes (everything not in (1) and (2)).
>>>>>>> >>>>>> >> >
>>>>>>> >>>>>> >> >
>>>>>>> >>>>>> >> > Thank you Robert. This makes it much easier. What is the
>>>>>>> source of the sed script? I am not sure why some of those lines are 
>>>>>>> there.
>>>>>>> It would be much easier for us to comment on the script source if it is
>>>>>>> reviewable somewhere.
>>>>>>> >>>>>> >>
>>>>>>> >>>>>> >> I just gathered up common patterns as I was trying to go
>>>>>>> through and
>>>>>>> >>>>>> >> review the files... Mostly it was an exercise in finding a
>>>>>>> compact
>>>>>>> >>>>>> >> representation for the delta, not trying to be a perfect
>>>>>>> conversion.
>>>>>>> >>>>>> >> (I do think in retrospect, if we do something like this
>>>>>>> again, it
>>>>>>> >>>>>> >> would be preferable to commit a script that does the
>>>>>>> auto-conversion
>>>>>>> >>>>>> >> (maybe even with some patch files for manual changes) both
>>>>>>> for ease of
>>>>>>> >>>>>> >> reviewing and to avoid the stop-the-world situation we're
>>>>>>> in now. (I'm
>>>>>>> >>>>>> >> still worried that some changes will get lost in the
>>>>>>> shuffle.)
>>>>>>>
>>>>>>

Reply via email to