Thank you to everybody who worked on this migration. 

I just wanted to clarify - does it mean, since it’s finished now, we can update 
a website as before?

> On 15 May 2020, at 01:33, Aizhamal Nurmamat kyzy <aizha...@apache.org> wrote:
> 
> Thank you Ahmet, Brian, Robert and everyone else who spent time working on 
> this. The pull requests are now merged and the website seems to be working as 
> expected [1].
> 
> One minor issue that I have noticed is that the code blocks have a grey 
> background, which makes it less accessible than before. I created a Jira 
> issue for this [2], and will follow up to get it fixed. If you notice any 
> other issues, please file Jira issues and let me know.
> 
> Hope you are all safe,
> Aizhamal
> 
> [1] https://beam.apache.org/ <https://beam.apache.org/>
> [2] https://issues.apache.org/jira/browse/BEAM-10001 
> <https://issues.apache.org/jira/browse/BEAM-10001>
> On Thu, May 14, 2020 at 11:25 AM Pablo Estrada <pabl...@google.com 
> <mailto:pabl...@google.com>> wrote:
> Here's a zipped-up tree from a staged sample of the website: 
> https://drive.google.com/file/d/1LKL936tBJ79jpjvlL5vC5uYYwTHsWXiJ/view?usp=sharing
>  
> <https://drive.google.com/file/d/1LKL936tBJ79jpjvlL5vC5uYYwTHsWXiJ/view?usp=sharing>
> 
> I'd also suggest tagging the commit, so we can find the fist commit later on 
> for reference. I can push the tag after the PR is merged.
> 
> On Thu, May 14, 2020 at 10:43 AM Ahmet Altay <al...@google.com 
> <mailto:al...@google.com>> wrote:
> 
> 
> On Thu, May 14, 2020 at 9:16 AM Aizhamal Nurmamat kyzy <aizha...@apache.org 
> <mailto:aizha...@apache.org>> wrote:
> Thank you all for reviewing and validating this pull request. I see that all 
> tests are passing now, should we merge it?
> 
> +1 to merging now.
> 
> Before the merge, please share a link to an archive copy of the old website. 
> After the merge, please try out the live website see if it is working as 
> expected.
>  
> 
> On Wed, May 13, 2020, 5:41 PM Ahmet Altay <al...@google.com 
> <mailto:al...@google.com>> wrote:
> Thank you! Let's merge it once tests are done.
> 
> On Wed, May 13, 2020 at 5:23 PM Robert Bradshaw <rober...@google.com 
> <mailto:rober...@google.com>> wrote:
> I took a (non-comprehensive) look at these as well, and didn't see any 
> issues, so am happy to sign off on this. Thanks Nam, Brian, Ahmet, and 
> everyone else. 
> 
> On Wed, May 13, 2020 at 7:58 AM Nam Bui <nam....@polidea.com 
> <mailto:nam....@polidea.com>> wrote:
> Hi Ahmet,
> "Does this mean the internal links (e.g. contribute/team) will disappear?"
> Yes, I'd like to get rid of them. And to make sure it won't appear to confuse 
> people, I replaced all of the spots using "contribute/team" with the external 
> one. Currently, we only have 2 "redirect_to" links which are 
> "contribute/team" & "contribute/project/team", so this act won't have any 
> affects.
> Also, based on your question, I just added a section in the documentation 
> (CONTRIBUTE.md), which mentions the replaced/removed features of Jekyll in 
> terms of writing a new blog post or documentation in Hugo.
> 
> Got it. The main effect will be any one has a bookmark/link to these pages, 
> those links will no longer work. It is fine if it is only limited to these 2 
> urls.
>  
> 
> 
> On Wed, May 13, 2020 at 4:17 AM Ahmet Altay <al...@google.com 
> <mailto:al...@google.com>> wrote:
> - I reviewed the diff output with Nam's explanations. The change looks 
> minimal. Large diffs are primarily coming from index and redirect files. 
> codeblocks have differences but the content is seemingly preserved. IIUC, the 
> source of truth is snippet files anyway. (It would be good to get one more 
> set of eyes on this.)
> - Brian and I reviewed the infrastructure changes. They look reasonable.
> 
> I think PR is very close to a mergeable state. Especially if we can get an 
> archive copy of the current website, I will be comfortable with the merge.
> 
> And, thank you Nam for your work so far.
> 
> On Tue, May 12, 2020 at 4:13 PM Nam Bui <nam....@polidea.com 
> <mailto:nam....@polidea.com>> wrote:
> Hi,
> 
> A new commit covers Robert's script is pushed [1], and also the script output 
> is attached in this email.
> 
> Based on the diff output of the script, my strategy is looking at the 
> sections which contain the large/massive removed texts, to make sure that 
> there are no lost content or files. And below are all of the links which have 
> large of the removed content.
> 
> - Detection:
> These links lost some of the contents. Fixed!
> + documentation/runners/jstorm/index.html
> + documentation/dsls/sql/calcite/lexical-structure/index.html
> + documentation/dsls/sql/zetasql/data-types/index.html
> + documentation/dsls/sql/zetasql/query-syntax/index.html
> 
> - Aliases:
> These links are redirected links. So in Hugo, these HTML files only include 
> redirected URLs. I also took a look at them to ensure the content was there.
> + documentation/dsls/sql/calcite/lexical/index.html
> + old URLs of blog posts
> 
> - Ignore:
> Hugo and Jekyll have different structures of code highlighters rendering in 
> HTML. Ahmed & Pablo agree with me that its fair to ignore them for now.
> + codeblocks
> 
> - Missing files:
> The script returns some of “missing files” status
> + coming-soon.html (this file was used nowhere in Jekyll, so I didn’t migrate 
> to Hugo)
> + documentation/dsls/sql/statements/select/index.html (aliases)
> + blog/2019/04/25/beam-2.12.0.html (fixed!)
> + blog/2020/05/08/beam-summit-digital-2020.html (new blog post, added!)
> + v2/index.html (this file was used nowhere in Jekyll, so I didn’t migrate to 
> Hugo)
> + contribute/team/index.html (mentioned in “redirect_to” below)
> + contribute/project/team/index.html (mentioned in “redirect_to” below)
> 
> - “redirect_to”:
> In Jekyll, there is a feature called “redirect_to”. For instance, you click 
> on an internal link “contribute/team/” to reach the markdown “team.md”, then 
> from the markdown file, it redirects you to the external URL 
> “https://example.com <https://example.com/>”.
> However, there is no such feature in Hugo. My solution is to directly replace 
> “contribute/team/” with “https://example.com <https://example.com/>”.
> 
> Does this mean the internal links (e.g. contribute/team) will disappear?
>  
> 
> [1] https://github.com/apache/beam/pull/11554 
> <https://github.com/apache/beam/pull/11554>
> On Mon, May 11, 2020 at 7:34 PM Nam Bui <nam....@polidea.com 
> <mailto:nam....@polidea.com>> wrote:
> Updates for today:
> - Thanks Brian & Ahmet for your reviews. I left my comments for some of the 
> questions and also adapted new changes to the reviews [1].
> - I see that the new blog post was merged yesterday, so I added it to the PR 
> as well.
> 
> I briefly tried the script from Robert with the input of build files from old 
> and new websites. It seemed to work well in terms of detecting missing files 
> (or probably wrong links leading to missing files). I will push another 
> commit to fix all that up, hope can be tomorrow.
> 
> [1] https://github.com/apache/beam/pull/11554#issuecomment-626792031 
> <https://github.com/apache/beam/pull/11554#issuecomment-626792031>
> 
> Best regards,
> Nam
> 
> 
> On Mon, May 11, 2020 at 9:01 AM Nam Bui <nam....@polidea.com 
> <mailto:nam....@polidea.com>> wrote:
> Hi,
> 
> @Ahmet: Yeah, it's all clear to me. :)
> @Robert: Thanks for your ideas and also the script. It really helps me to 
> serve my works.
> 
> Best regard!
> 
> On Sat, May 9, 2020 at 2:10 AM Ahmet Altay <al...@google.com 
> <mailto:al...@google.com>> wrote:
> This sounds reasonable to me. Thank you. Nam, does it make sense to you?
> 
> On Fri, May 8, 2020 at 11:53 AM Robert Bradshaw <rober...@google.com 
> <mailto:rober...@google.com>> wrote:
> I'd really like to not see this work go to waste, both the original revision, 
> the further efforts Nam has done in making it more manageable to review, and 
> the work put into reviewing this so far, so we can get the benefits of being 
> on Hugo. How about this for a concrete proposal: 
> 
> (1) We get "standard" approval from one or more committers for the 
> infrastructure changes, just as with any other PR. Brian has already started 
> this, but if others could step up as well that'd be great. 
> 
> (2) Reviewers (and authors) typically count on (or request) sufficient 
> automated test coverage to augment the fact that their eyeballs are fallible, 
> which is something that is missing here (and given the size of the change not 
> easily compensated for by a more detailed manual review). How about we use 
> the script above (or similar) as an automated test to validate the website's 
> contents haven't (materially) changed. I feel we've validated enough that the 
> style looks good via spot checking (which is something that should work on 
> all pages if it works on one). The diff between the current site and the 
> newly generated site should be empty (it might already be [1]), or at least 
> we should get a stamp of approval on the plain-text diff (which should be 
> small), before merging. 
> 
> (3) To make things easier, everyone holds off on making any changes to the 
> old site until a fixed future date (say, next Wednesday). Hopefully we can 
> get it merged by then. If not, a condition for merging would be a commitment 
> incorporating new changes after this date. 
> 
> Does this sound reasonable? 
> 
> - Robert
> 
> 
> 
> [1] I'd be curious as to how small the diff already is, but my script relies 
> on local directories with the generated HTML, which I don't have handy at the 
> moment. 
> 
> 
> 
> On Fri, May 8, 2020 at 10:45 AM Robert Bradshaw <rober...@google.com 
> <mailto:rober...@google.com>> wrote:
> Here's a script that we could run on the old and new sites that should 
> quickly catch any major issues but not get caught up in formatting minutia. 
> 
> 
> 
> On Fri, May 8, 2020 at 10:23 AM Robert Bradshaw <rober...@google.com 
> <mailto:rober...@google.com>> wrote:
> On Fri, May 8, 2020 at 9:58 AM Aizhamal Nurmamat kyzy <aizha...@apache.org 
> <mailto:aizha...@apache.org>> wrote:
> I understand the difficulty, and this certainly comes with lessons learned 
> for future similar projects. 
> 
> To your questions Robert:
> (1 and 2) I will commit to review the text in the resulting pages. I will try 
> and use some automation to extract visible text from each page and diff it 
> with the current state of the website. I can do this starting next week. From 
> some quick research, there seem to be tools that help with this analysis 
> (https://stackoverflow.com/questions/3286955/compare-two-websites-and-see-if-they-are-equal
>  
> <https://stackoverflow.com/questions/3286955/compare-two-websites-and-see-if-they-are-equal>)
> 
> At first glance it looks like these tools would give diffs that are *larger* 
> than the 47K one we're struggling to review here. 
>  
> By remaining in this state, we hold others up from making changes, or we 
> increase the amount of work needed after merging to port over changes that 
> may be missed. If we move forward, new changes can be done on top of the new 
> website.
> 
> I agree we don't want to hold others up from making changes. However, the 
> amount of work to port changes over seems small in comparison to everything 
> else that is being discussed here. (It also provides good incentives to reach 
> the bar quickly and has the advantage of falling on the right people.) (3) 
> will still take some time. 
> 
> If we go this route, we're lowering the bar for doc changes, but not removing 
> it. 
>  
> (3) This makes sense. Brian, would you be able to spend some time to look at 
> the automation changes (build files and scripts) to ensure they look fine?
> 
> I would also like to write a post mortem to extract lessons learned and avoid 
> this situation in the future.
> 
> 
> On Fri, May 8, 2020 at 9:44 AM Brian Hulette <bhule...@google.com 
> <mailto:bhule...@google.com>> wrote:
> I'm -0 on merging as-is. I have the same concerns as Robert and he's voiced 
> them very well so I won't waste time re-airing them.
> 
> (2) I spot checked the content, pulled out some common patterns, and
> it mostly looks good, but there were also some issues (e.g. several
> pages were replaced with the contents from entirely different pages).
> I would be more comfortable if, say, a smoke test of comparing the old
> and new sites, with html tags stripped and ignoring whitespace,
> yielded what should be empty diffs.
> 
> Can you share any details about this analysis?
> 
> It was basically paging through the diff, adding things to the sed script, 
> and then looking at more diffs. 
>  
> +1 for verifying the old and new are the same by diffing the output.
>  
> (3) It'd be good to have someone give a stamp of approval on the
> infrastructure changes, at least to validate that we're not going to
> be taking on extra tech debt with regard to jenkins stability and
> developer workflow. I see that Brian has at least looked at this some.
> 
> My involvement so far was just recognizing a problem (creating root-owned 
> files on jenkins workers) and helping to fix it. If there's anyone available 
> who's familiar with the website infrastructure it would be great if they 
> could take a look instead (if not I could probably acquaint myself enough to 
> review).
> 
> On Thu, May 7, 2020 at 11:57 PM Robert Bradshaw <rober...@google.com 
> <mailto:rober...@google.com>> wrote:
> This is a tough situation.
> 
> It would have been much better if this transition was structured in
> such a way that the review was more manageable (e.g. the suggestion of
> scripts, not mixing in voluminous unnecessary changes like whitespace,
> and not updating content), and possibly even incrementally (e.g. the
> new site would have been developed over multiple PRs in a subdomain or
> subdirectory while being worked on). But hindsight is 20/20 and no
> one, myself included, thought to bring this up when the original
> migration was proposed, so this is more something to keep in mind for
> the future. I also appreciate the efforts that have been made to clean
> things up (e.g. preserving history) and address feedback.
> 
> So, where do we go from here? My first thought is that I really don't
> want to set a precedent that just because a PR "will require a large
> effort" and in a state that if we don't "move forward and merge what
> we have now" then "work done so far will be lost" means that we think
> it's OK to forgo doing a proper review.
> 
> On the other hand, there are some mitigating factors with this being
> the website and not the code in that "bugs," though possibly
> embarrassing, won't break production pipelines or data loss, and
> though the source is technically part of the release, when we find
> something to fix we can fix the live website much more quickly than go
> through the whole release process and convince people to upgrade. (I
> recognize accepting this argument is, to some degree at least, saying
> that we don't care about the correctness of docs as much as so-called
> "real" code, if we go there.)
> 
> If we decide to go ahead and merge (and I would not object), there are
> some things I would like to see.
> 
> (1) I would like to understand what we would do afterwards to "review
> the outcome, and ensure that all the content is there," and why it
> can't be done before merging instead. (Is it because it'd take time
> and we don't want to incorporate changes that are made to the website
> in the meantime? I think that boat has sailed, but maybe we can avoid
> making it worse...)
> 
> (2) I spot checked the content, pulled out some common patterns, and
> it mostly looks good, but there were also some issues (e.g. several
> pages were replaced with the contents from entirely different pages).
> I would be more comfortable if, say, a smoke test of comparing the old
> and new sites, with html tags stripped and ignoring whitespace,
> yielded what should be empty diffs.
> 
> (3) It'd be good to have someone give a stamp of approval on the
> infrastructure changes, at least to validate that we're not going to
> be taking on extra tech debt with regard to jenkins stability and
> developer workflow. I see that Brian has at least looked at this some.
> 
> - Robert
> 
> 
> On Thu, May 7, 2020 at 12:40 PM Aizhamal Nurmamat kyzy
> <aizha...@apache.org <mailto:aizha...@apache.org>> wrote:
> >
> > Thank you Ahmet.
> >
> > Robert/Brian, what do you think?
> >
> > The website staging and pre commit tests have passed [1]. If nobody has 
> > objections, we could merge it soon.
> >
> >
> > [1] https://github.com/apache/beam/pull/11554 
> > <https://github.com/apache/beam/pull/11554>
> >
> > On Thu, May 7, 2020 at 11:38 AM Ahmet Altay <al...@google.com 
> > <mailto:al...@google.com>> wrote:
> >>
> >>
> >>
> >> On Thu, May 7, 2020 at 10:50 AM Aizhamal Nurmamat kyzy 
> >> <aizha...@apache.org <mailto:aizha...@apache.org>> wrote:
> >>>
> >>> Thanks for the writeup Ahmet.
> >>>
> >>> My bias is to move forward and merge the PR. After this, we'll review the 
> >>> outcome, and ensure that all the content is there. Nam will help us with 
> >>> that.
> >>> The reason that I'd like to move forward and merge what we have now - is 
> >>> that if we don't do that, the work done so far will be lost.
> >>> We'll make sure to stage the website in its current state, and use that 
> >>> as reference/archive to ensure all the content have been moved.
> >>>
> >>> Is this reasonable to everyone?
> >>
> >>
> >> This is reasonable to me. I agree with your reasons.
> >>
> >> What do others think?
> >>
> >>>
> >>>
> >>>
> >>> On Wed, May 6, 2020 at 7:07 PM Ahmet Altay <al...@google.com 
> >>> <mailto:al...@google.com>> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On Wed, May 6, 2020 at 2:33 PM Aizhamal Nurmamat kyzy 
> >>>> <aizha...@apache.org <mailto:aizha...@apache.org>> wrote:
> >>>>>>
> >>>>>>
> >>>>>> > 1) Currently, the main blocker for merging is Staging Test Failures.
> >>>>>>
> >>>>>> That and finishing the review. (Is someone tracking/coordinating this?)
> >>>>>
> >>>>>
> >>>>> I am coordinating the work on the failed tests, but I would need other 
> >>>>> committer's help to perform the review. @Ahmet, could you help us 
> >>>>> prioritize the review for this PR?
> >>>>
> >>>>
> >>>> The problem is there are too many manual changes. Reviewing this change 
> >>>> in this form will require a large effort. I do not think I can interrupt 
> >>>> other projects to prioritize reviews on this PR. IMO, we have a few 
> >>>> options:
> >>>>
> >>>> - PR to be restructured in the format suggested in this thread. A commit 
> >>>> for infrastructure changes from Jekyll to hugo. A second commit for a 
> >>>> script that will convert the majority of the content. A third commit for 
> >>>> the execution of the script. And a fourth commit for the additional 
> >>>> manual content changes. If Nam can get to this form, people on this 
> >>>> thread myself/Robert/Pablo/Brian can review the changes.
> >>>> - Another option is, we can accept that we already invested in this 
> >>>> transition and overall this is a good change, and merge the PR more or 
> >>>> less in its current form (with tests fixed and open comments addressed) 
> >>>> even though it has issues. And then overtime fix the issues we 
> >>>> encounter. There was already some amount of review and visual 
> >>>> comparisons, we risk losing some recent content changes but I am 
> >>>> assuming this will not be much. If Nam can commit to compare two sites 
> >>>> after a merge, fixing the majority of the delta, this might be a viable 
> >>>> option.
> >>>>
> >>>> Another thing we can do, we can archive/store a read-only copy of the 
> >>>> current website in an "archive" url temporarily instead of completely 
> >>>> deleting it. It will give us a baseline for a while to go back to the 
> >>>> old content and move any missing data. (And maybe, someone can come up 
> >>>> with an innovative way to compare the textual content of both sites.) A 
> >>>> note on the stop world approach, I believe we are already failing on 
> >>>> that with merge conflicts showing up on the PR. It will be better for us 
> >>>> to complete the transition as soon as possible. Fixing after the initial 
> >>>> merge might be a simpler task, especially if we can archive the old site.
> >>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> > Michal showed Nam how to handle the 1st test which was about Apache 
> >>>>>> > License missing.
> >>>>>> >
> >>>>>> > However, the 2nd and 3rd tests looked like some kind of permissions 
> >>>>>> > error on the Jenkins worker, not to be configured by code. For more 
> >>>>>> > details based on Jenkin logs, the 2nd test failed because of 
> >>>>>> > website/www/site/themes and the 3rd test failed because of 
> >>>>>> > website/www/node_modules, they are both auto-generated files on 
> >>>>>> > build. Can someone help Nam to look into this?
> >>>>>> >
> >>>>>> > RAT ("Run RAT PreCommit") — FAILURE
> >>>>>> > Website_Stage_GCS ("Run Website_Stage_GCS PreCommit") — FAILURE
> >>>>>> > Website_Stage_GCS ("Run Website_Stage_GCS PreCommit") — FAILURE
> >>>>>> >
> >>>>>> > 2) Are there any other blockers for merging? @Ahmet/Robert/others 
> >>>>>> > please share if there are any other blockers.
> >>>>>> >
> >>>>>> >
> >>>>>> > [1] https://github.com/gohugoio/hugo/pull/4494 
> >>>>>> > <https://github.com/gohugoio/hugo/pull/4494>
> >>>>>> >
> >>>>>> >
> >>>>>> > On Wed, May 6, 2020 at 10:19 AM Robert Bradshaw <rober...@google.com 
> >>>>>> > <mailto:rober...@google.com>> wrote:
> >>>>>> >>
> >>>>>> >> On Mon, May 4, 2020 at 7:07 PM Ahmet Altay <al...@google.com 
> >>>>>> >> <mailto:al...@google.com>> wrote:
> >>>>>> >> >
> >>>>>> >> >> On Mon, May 4, 2020 at 6:30 PM Robert Bradshaw 
> >>>>>> >> >> <rober...@google.com <mailto:rober...@google.com>> wrote:
> >>>>>> >> >>>
> >>>>>> >> >>> I took the massive commit and split it up into:
> >>>>>> >> >>>
> >>>>>> >> >>> (1) Infrastructure changes (basically everything outside of
> >>>>>> >> >>> (website/www/site/content)
> >>>>>> >> >>> (2) Sed script changes, and
> >>>>>> >> >>> (3) Manual changes (everything not in (1) and (2)).
> >>>>>> >> >
> >>>>>> >> >
> >>>>>> >> > Thank you Robert. This makes it much easier. What is the source 
> >>>>>> >> > of the sed script? I am not sure why some of those lines are 
> >>>>>> >> > there. It would be much easier for us to comment on the script 
> >>>>>> >> > source if it is reviewable somewhere.
> >>>>>> >>
> >>>>>> >> I just gathered up common patterns as I was trying to go through and
> >>>>>> >> review the files... Mostly it was an exercise in finding a compact
> >>>>>> >> representation for the delta, not trying to be a perfect conversion.
> >>>>>> >> (I do think in retrospect, if we do something like this again, it
> >>>>>> >> would be preferable to commit a script that does the auto-conversion
> >>>>>> >> (maybe even with some patch files for manual changes) both for ease 
> >>>>>> >> of
> >>>>>> >> reviewing and to avoid the stop-the-world situation we're in now. 
> >>>>>> >> (I'm
> >>>>>> >> still worried that some changes will get lost in the shuffle.)

Reply via email to