Also an update from +Nam Bui <nam....@polidea.com> :

"This commit [1] is up-to-date. So I walked through all of markdown files.
Apart from Syntax changes between Jekyll & Hugo, if there were any
differences in contents regarding to removed/added/modified, I would have
double check the text with the current website. I can make sure that now
98-99% (probably 100% if I don’t miss any typos) of the contents are synced
and 0 lost files. Thus, there should be no unexpected content changes which
appears in this commit [1] anymore.


[1]
https://github.com/apache/beam/pull/11554/commits/94e624fb43aa2218150fd3d97333c58d3d9ff653
"

On Fri, May 8, 2020 at 9:57 AM Aizhamal Nurmamat kyzy <aizha...@apache.org>
wrote:

> I understand the difficulty, and this certainly comes with lessons learned
> for future similar projects.
>
> To your questions Robert:
> (1 and 2) I will commit to review the text in the resulting pages. I will
> try and use some automation to extract visible text from each page and diff
> it with the current state of the website. I can do this starting next week.
> From some quick research, there seem to be tools that help with this
> analysis (
> https://stackoverflow.com/questions/3286955/compare-two-websites-and-see-if-they-are-equal
> )
> By remaining in this state, we hold others up from making changes, or we
> increase the amount of work needed after merging to port over changes that
> may be missed. If we move forward, new changes can be done on top of the
> new website.
>
> (3) This makes sense. Brian, would you be able to spend some time to look
> at the automation changes (build files and scripts) to ensure they look
> fine?
>
> I would also like to write a post mortem to extract lessons learned and
> avoid this situation in the future.
>
>
> On Fri, May 8, 2020 at 9:44 AM Brian Hulette <bhule...@google.com> wrote:
>
>> I'm -0 on merging as-is. I have the same concerns as Robert and he's
>> voiced them very well so I won't waste time re-airing them.
>>
>> (2) I spot checked the content, pulled out some common patterns, and
>>> it mostly looks good, but there were also some issues (e.g. several
>>> pages were replaced with the contents from entirely different pages).
>>> I would be more comfortable if, say, a smoke test of comparing the old
>>> and new sites, with html tags stripped and ignoring whitespace,
>>> yielded what should be empty diffs.
>>>
>>
>> Can you share any details about this analysis?
>>
>> +1 for verifying the old and new are the same by diffing the output.
>>
>>
>>> (3) It'd be good to have someone give a stamp of approval on the
>>> infrastructure changes, at least to validate that we're not going to
>>> be taking on extra tech debt with regard to jenkins stability and
>>> developer workflow. I see that Brian has at least looked at this some.
>>
>>
>> My involvement so far was just recognizing a problem (creating root-owned
>> files on jenkins workers) and helping to fix it. If there's anyone
>> available who's familiar with the website infrastructure it would be great
>> if they could take a look instead (if not I could probably acquaint myself
>> enough to review).
>>
>> On Thu, May 7, 2020 at 11:57 PM Robert Bradshaw <rober...@google.com>
>> wrote:
>>
>>> This is a tough situation.
>>>
>>> It would have been much better if this transition was structured in
>>> such a way that the review was more manageable (e.g. the suggestion of
>>> scripts, not mixing in voluminous unnecessary changes like whitespace,
>>> and not updating content), and possibly even incrementally (e.g. the
>>> new site would have been developed over multiple PRs in a subdomain or
>>> subdirectory while being worked on). But hindsight is 20/20 and no
>>> one, myself included, thought to bring this up when the original
>>> migration was proposed, so this is more something to keep in mind for
>>> the future. I also appreciate the efforts that have been made to clean
>>> things up (e.g. preserving history) and address feedback.
>>>
>>> So, where do we go from here? My first thought is that I really don't
>>> want to set a precedent that just because a PR "will require a large
>>> effort" and in a state that if we don't "move forward and merge what
>>> we have now" then "work done so far will be lost" means that we think
>>> it's OK to forgo doing a proper review.
>>>
>>> On the other hand, there are some mitigating factors with this being
>>> the website and not the code in that "bugs," though possibly
>>> embarrassing, won't break production pipelines or data loss, and
>>> though the source is technically part of the release, when we find
>>> something to fix we can fix the live website much more quickly than go
>>> through the whole release process and convince people to upgrade. (I
>>> recognize accepting this argument is, to some degree at least, saying
>>> that we don't care about the correctness of docs as much as so-called
>>> "real" code, if we go there.)
>>>
>>> If we decide to go ahead and merge (and I would not object), there are
>>> some things I would like to see.
>>>
>>> (1) I would like to understand what we would do afterwards to "review
>>> the outcome, and ensure that all the content is there," and why it
>>> can't be done before merging instead. (Is it because it'd take time
>>> and we don't want to incorporate changes that are made to the website
>>> in the meantime? I think that boat has sailed, but maybe we can avoid
>>> making it worse...)
>>>
>>> (2) I spot checked the content, pulled out some common patterns, and
>>> it mostly looks good, but there were also some issues (e.g. several
>>> pages were replaced with the contents from entirely different pages).
>>> I would be more comfortable if, say, a smoke test of comparing the old
>>> and new sites, with html tags stripped and ignoring whitespace,
>>> yielded what should be empty diffs.
>>>
>>> (3) It'd be good to have someone give a stamp of approval on the
>>> infrastructure changes, at least to validate that we're not going to
>>> be taking on extra tech debt with regard to jenkins stability and
>>> developer workflow. I see that Brian has at least looked at this some.
>>>
>>> - Robert
>>>
>>>
>>> On Thu, May 7, 2020 at 12:40 PM Aizhamal Nurmamat kyzy
>>> <aizha...@apache.org> wrote:
>>> >
>>> > Thank you Ahmet.
>>> >
>>> > Robert/Brian, what do you think?
>>> >
>>> > The website staging and pre commit tests have passed [1]. If nobody
>>> has objections, we could merge it soon.
>>> >
>>> >
>>> > [1] https://github.com/apache/beam/pull/11554
>>> >
>>> > On Thu, May 7, 2020 at 11:38 AM Ahmet Altay <al...@google.com> wrote:
>>> >>
>>> >>
>>> >>
>>> >> On Thu, May 7, 2020 at 10:50 AM Aizhamal Nurmamat kyzy <
>>> aizha...@apache.org> wrote:
>>> >>>
>>> >>> Thanks for the writeup Ahmet.
>>> >>>
>>> >>> My bias is to move forward and merge the PR. After this, we'll
>>> review the outcome, and ensure that all the content is there. Nam will help
>>> us with that.
>>> >>> The reason that I'd like to move forward and merge what we have now
>>> - is that if we don't do that, the work done so far will be lost.
>>> >>> We'll make sure to stage the website in its current state, and use
>>> that as reference/archive to ensure all the content have been moved.
>>> >>>
>>> >>> Is this reasonable to everyone?
>>> >>
>>> >>
>>> >> This is reasonable to me. I agree with your reasons.
>>> >>
>>> >> What do others think?
>>> >>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Wed, May 6, 2020 at 7:07 PM Ahmet Altay <al...@google.com> wrote:
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Wed, May 6, 2020 at 2:33 PM Aizhamal Nurmamat kyzy <
>>> aizha...@apache.org> wrote:
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> > 1) Currently, the main blocker for merging is Staging Test
>>> Failures.
>>> >>>>>>
>>> >>>>>> That and finishing the review. (Is someone tracking/coordinating
>>> this?)
>>> >>>>>
>>> >>>>>
>>> >>>>> I am coordinating the work on the failed tests, but I would need
>>> other committer's help to perform the review. @Ahmet, could you help us
>>> prioritize the review for this PR?
>>> >>>>
>>> >>>>
>>> >>>> The problem is there are too many manual changes. Reviewing this
>>> change in this form will require a large effort. I do not think I can
>>> interrupt other projects to prioritize reviews on this PR. IMO, we have a
>>> few options:
>>> >>>>
>>> >>>> - PR to be restructured in the format suggested in this thread. A
>>> commit for infrastructure changes from Jekyll to hugo. A second commit for
>>> a script that will convert the majority of the content. A third commit for
>>> the execution of the script. And a fourth commit for the additional manual
>>> content changes. If Nam can get to this form, people on this thread
>>> myself/Robert/Pablo/Brian can review the changes.
>>> >>>> - Another option is, we can accept that we already invested in this
>>> transition and overall this is a good change, and merge the PR more or less
>>> in its current form (with tests fixed and open comments addressed) even
>>> though it has issues. And then overtime fix the issues we encounter. There
>>> was already some amount of review and visual comparisons, we risk losing
>>> some recent content changes but I am assuming this will not be much. If Nam
>>> can commit to compare two sites after a merge, fixing the majority of the
>>> delta, this might be a viable option.
>>> >>>>
>>> >>>> Another thing we can do, we can archive/store a read-only copy of
>>> the current website in an "archive" url temporarily instead of completely
>>> deleting it. It will give us a baseline for a while to go back to the old
>>> content and move any missing data. (And maybe, someone can come up with an
>>> innovative way to compare the textual content of both sites.) A note on the
>>> stop world approach, I believe we are already failing on that with merge
>>> conflicts showing up on the PR. It will be better for us to complete the
>>> transition as soon as possible. Fixing after the initial merge might be a
>>> simpler task, especially if we can archive the old site.
>>> >>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>>
>>> >>>>>> > Michal showed Nam how to handle the 1st test which was about
>>> Apache License missing.
>>> >>>>>> >
>>> >>>>>> > However, the 2nd and 3rd tests looked like some kind of
>>> permissions error on the Jenkins worker, not to be configured by code. For
>>> more details based on Jenkin logs, the 2nd test failed because of
>>> website/www/site/themes and the 3rd test failed because of
>>> website/www/node_modules, they are both auto-generated files on build. Can
>>> someone help Nam to look into this?
>>> >>>>>> >
>>> >>>>>> > RAT ("Run RAT PreCommit") — FAILURE
>>> >>>>>> > Website_Stage_GCS ("Run Website_Stage_GCS PreCommit") — FAILURE
>>> >>>>>> > Website_Stage_GCS ("Run Website_Stage_GCS PreCommit") — FAILURE
>>> >>>>>> >
>>> >>>>>> > 2) Are there any other blockers for merging?
>>> @Ahmet/Robert/others please share if there are any other blockers.
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>> > [1] https://github.com/gohugoio/hugo/pull/4494
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>> > On Wed, May 6, 2020 at 10:19 AM Robert Bradshaw <
>>> rober...@google.com> wrote:
>>> >>>>>> >>
>>> >>>>>> >> On Mon, May 4, 2020 at 7:07 PM Ahmet Altay <al...@google.com>
>>> wrote:
>>> >>>>>> >> >
>>> >>>>>> >> >> On Mon, May 4, 2020 at 6:30 PM Robert Bradshaw <
>>> rober...@google.com> wrote:
>>> >>>>>> >> >>>
>>> >>>>>> >> >>> I took the massive commit and split it up into:
>>> >>>>>> >> >>>
>>> >>>>>> >> >>> (1) Infrastructure changes (basically everything outside of
>>> >>>>>> >> >>> (website/www/site/content)
>>> >>>>>> >> >>> (2) Sed script changes, and
>>> >>>>>> >> >>> (3) Manual changes (everything not in (1) and (2)).
>>> >>>>>> >> >
>>> >>>>>> >> >
>>> >>>>>> >> > Thank you Robert. This makes it much easier. What is the
>>> source of the sed script? I am not sure why some of those lines are there.
>>> It would be much easier for us to comment on the script source if it is
>>> reviewable somewhere.
>>> >>>>>> >>
>>> >>>>>> >> I just gathered up common patterns as I was trying to go
>>> through and
>>> >>>>>> >> review the files... Mostly it was an exercise in finding a
>>> compact
>>> >>>>>> >> representation for the delta, not trying to be a perfect
>>> conversion.
>>> >>>>>> >> (I do think in retrospect, if we do something like this again,
>>> it
>>> >>>>>> >> would be preferable to commit a script that does the
>>> auto-conversion
>>> >>>>>> >> (maybe even with some patch files for manual changes) both for
>>> ease of
>>> >>>>>> >> reviewing and to avoid the stop-the-world situation we're in
>>> now. (I'm
>>> >>>>>> >> still worried that some changes will get lost in the shuffle.)
>>>
>>

Reply via email to