This is a tough situation.

It would have been much better if this transition was structured in
such a way that the review was more manageable (e.g. the suggestion of
scripts, not mixing in voluminous unnecessary changes like whitespace,
and not updating content), and possibly even incrementally (e.g. the
new site would have been developed over multiple PRs in a subdomain or
subdirectory while being worked on). But hindsight is 20/20 and no
one, myself included, thought to bring this up when the original
migration was proposed, so this is more something to keep in mind for
the future. I also appreciate the efforts that have been made to clean
things up (e.g. preserving history) and address feedback.

So, where do we go from here? My first thought is that I really don't
want to set a precedent that just because a PR "will require a large
effort" and in a state that if we don't "move forward and merge what
we have now" then "work done so far will be lost" means that we think
it's OK to forgo doing a proper review.

On the other hand, there are some mitigating factors with this being
the website and not the code in that "bugs," though possibly
embarrassing, won't break production pipelines or data loss, and
though the source is technically part of the release, when we find
something to fix we can fix the live website much more quickly than go
through the whole release process and convince people to upgrade. (I
recognize accepting this argument is, to some degree at least, saying
that we don't care about the correctness of docs as much as so-called
"real" code, if we go there.)

If we decide to go ahead and merge (and I would not object), there are
some things I would like to see.

(1) I would like to understand what we would do afterwards to "review
the outcome, and ensure that all the content is there," and why it
can't be done before merging instead. (Is it because it'd take time
and we don't want to incorporate changes that are made to the website
in the meantime? I think that boat has sailed, but maybe we can avoid
making it worse...)

(2) I spot checked the content, pulled out some common patterns, and
it mostly looks good, but there were also some issues (e.g. several
pages were replaced with the contents from entirely different pages).
I would be more comfortable if, say, a smoke test of comparing the old
and new sites, with html tags stripped and ignoring whitespace,
yielded what should be empty diffs.

(3) It'd be good to have someone give a stamp of approval on the
infrastructure changes, at least to validate that we're not going to
be taking on extra tech debt with regard to jenkins stability and
developer workflow. I see that Brian has at least looked at this some.

- Robert


On Thu, May 7, 2020 at 12:40 PM Aizhamal Nurmamat kyzy
<aizha...@apache.org> wrote:
>
> Thank you Ahmet.
>
> Robert/Brian, what do you think?
>
> The website staging and pre commit tests have passed [1]. If nobody has 
> objections, we could merge it soon.
>
>
> [1] https://github.com/apache/beam/pull/11554
>
> On Thu, May 7, 2020 at 11:38 AM Ahmet Altay <al...@google.com> wrote:
>>
>>
>>
>> On Thu, May 7, 2020 at 10:50 AM Aizhamal Nurmamat kyzy <aizha...@apache.org> 
>> wrote:
>>>
>>> Thanks for the writeup Ahmet.
>>>
>>> My bias is to move forward and merge the PR. After this, we'll review the 
>>> outcome, and ensure that all the content is there. Nam will help us with 
>>> that.
>>> The reason that I'd like to move forward and merge what we have now - is 
>>> that if we don't do that, the work done so far will be lost.
>>> We'll make sure to stage the website in its current state, and use that as 
>>> reference/archive to ensure all the content have been moved.
>>>
>>> Is this reasonable to everyone?
>>
>>
>> This is reasonable to me. I agree with your reasons.
>>
>> What do others think?
>>
>>>
>>>
>>>
>>> On Wed, May 6, 2020 at 7:07 PM Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>
>>>>
>>>> On Wed, May 6, 2020 at 2:33 PM Aizhamal Nurmamat kyzy 
>>>> <aizha...@apache.org> wrote:
>>>>>>
>>>>>>
>>>>>> > 1) Currently, the main blocker for merging is Staging Test Failures.
>>>>>>
>>>>>> That and finishing the review. (Is someone tracking/coordinating this?)
>>>>>
>>>>>
>>>>> I am coordinating the work on the failed tests, but I would need other 
>>>>> committer's help to perform the review. @Ahmet, could you help us 
>>>>> prioritize the review for this PR?
>>>>
>>>>
>>>> The problem is there are too many manual changes. Reviewing this change in 
>>>> this form will require a large effort. I do not think I can interrupt 
>>>> other projects to prioritize reviews on this PR. IMO, we have a few 
>>>> options:
>>>>
>>>> - PR to be restructured in the format suggested in this thread. A commit 
>>>> for infrastructure changes from Jekyll to hugo. A second commit for a 
>>>> script that will convert the majority of the content. A third commit for 
>>>> the execution of the script. And a fourth commit for the additional manual 
>>>> content changes. If Nam can get to this form, people on this thread 
>>>> myself/Robert/Pablo/Brian can review the changes.
>>>> - Another option is, we can accept that we already invested in this 
>>>> transition and overall this is a good change, and merge the PR more or 
>>>> less in its current form (with tests fixed and open comments addressed) 
>>>> even though it has issues. And then overtime fix the issues we encounter. 
>>>> There was already some amount of review and visual comparisons, we risk 
>>>> losing some recent content changes but I am assuming this will not be 
>>>> much. If Nam can commit to compare two sites after a merge, fixing the 
>>>> majority of the delta, this might be a viable option.
>>>>
>>>> Another thing we can do, we can archive/store a read-only copy of the 
>>>> current website in an "archive" url temporarily instead of completely 
>>>> deleting it. It will give us a baseline for a while to go back to the old 
>>>> content and move any missing data. (And maybe, someone can come up with an 
>>>> innovative way to compare the textual content of both sites.) A note on 
>>>> the stop world approach, I believe we are already failing on that with 
>>>> merge conflicts showing up on the PR. It will be better for us to complete 
>>>> the transition as soon as possible. Fixing after the initial merge might 
>>>> be a simpler task, especially if we can archive the old site.
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> > Michal showed Nam how to handle the 1st test which was about Apache 
>>>>>> > License missing.
>>>>>> >
>>>>>> > However, the 2nd and 3rd tests looked like some kind of permissions 
>>>>>> > error on the Jenkins worker, not to be configured by code. For more 
>>>>>> > details based on Jenkin logs, the 2nd test failed because of 
>>>>>> > website/www/site/themes and the 3rd test failed because of 
>>>>>> > website/www/node_modules, they are both auto-generated files on build. 
>>>>>> > Can someone help Nam to look into this?
>>>>>> >
>>>>>> > RAT ("Run RAT PreCommit") — FAILURE
>>>>>> > Website_Stage_GCS ("Run Website_Stage_GCS PreCommit") — FAILURE
>>>>>> > Website_Stage_GCS ("Run Website_Stage_GCS PreCommit") — FAILURE
>>>>>> >
>>>>>> > 2) Are there any other blockers for merging? @Ahmet/Robert/others 
>>>>>> > please share if there are any other blockers.
>>>>>> >
>>>>>> >
>>>>>> > [1] https://github.com/gohugoio/hugo/pull/4494
>>>>>> >
>>>>>> >
>>>>>> > On Wed, May 6, 2020 at 10:19 AM Robert Bradshaw <rober...@google.com> 
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> On Mon, May 4, 2020 at 7:07 PM Ahmet Altay <al...@google.com> wrote:
>>>>>> >> >
>>>>>> >> >> On Mon, May 4, 2020 at 6:30 PM Robert Bradshaw 
>>>>>> >> >> <rober...@google.com> wrote:
>>>>>> >> >>>
>>>>>> >> >>> I took the massive commit and split it up into:
>>>>>> >> >>>
>>>>>> >> >>> (1) Infrastructure changes (basically everything outside of
>>>>>> >> >>> (website/www/site/content)
>>>>>> >> >>> (2) Sed script changes, and
>>>>>> >> >>> (3) Manual changes (everything not in (1) and (2)).
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > Thank you Robert. This makes it much easier. What is the source of 
>>>>>> >> > the sed script? I am not sure why some of those lines are there. It 
>>>>>> >> > would be much easier for us to comment on the script source if it 
>>>>>> >> > is reviewable somewhere.
>>>>>> >>
>>>>>> >> I just gathered up common patterns as I was trying to go through and
>>>>>> >> review the files... Mostly it was an exercise in finding a compact
>>>>>> >> representation for the delta, not trying to be a perfect conversion.
>>>>>> >> (I do think in retrospect, if we do something like this again, it
>>>>>> >> would be preferable to commit a script that does the auto-conversion
>>>>>> >> (maybe even with some patch files for manual changes) both for ease of
>>>>>> >> reviewing and to avoid the stop-the-world situation we're in now. (I'm
>>>>>> >> still worried that some changes will get lost in the shuffle.)

Reply via email to