Alan found the place where website publishing is configured [1], which has examples of project sites being configured with more than one git root. This is great for us because it allows us to leave generated javadocs/pydocs in the beam-site repository and publish website markdown content from the main repo.
Alan has a PR ready to publish generated HTML in a post-commit job [2]. Once that goes through the last step is to upgrade the publishing config. [1] https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitwcsub/files/config/gitwcsub.cfg [2] https://github.com/apache/beam/pull/6431 On Mon, Sep 24, 2018 at 4:35 PM Scott Wegner <sweg...@google.com> wrote: > > We could add a new default branch (master?) and keep all the > non-generated files (src/) there, and put generated files (content/) in the > asf-site branch (like we already do). > > I'm strongly in favor of having sources in a single repository. We have > significant process and infrastructure built up for the apache/beam repo > (for build, PR, CI, release, etc.) that we can take advantage of by putting > website sources in the same repo. The current beam-site repo PR automation > is flaky because it was custom-built and not given the same level of > attention as the main repo. > > The caveat to consolidating website sources in the main repo is that it > incentivizes putting the generated sources branch on the same repo. I've > documented a few of the reasons in the Appendix of the design doc [1]: > - It's easier to maintain a single repository; easily apply existing > tooling/infrastructure > - Jenkins tooling for publishing generated HTML may not work cross-repo [2] > > My preference is to move forward with the migration of sources to > apache/beam [master], and website generated HTML to apache/beam [asf-site]. > I like the idea of separating the publishing/hosting of generated > javadocs/pydocs since they add so much cruft, but it should not hold up the > migration. > > [1] > https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc > > [2] > https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace > > On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri <eh...@google.com> wrote: > >> Staying on beam-site SGTM. We could add a new default branch (master?) >> and keep all the non-generated files (src/) there, and put generated files >> (content/) in the asf-site branch (like we already do). >> That way there's no confusion as to which files you should update. >> (This is of course assuming we still place generated docs in git repos.) >> >> On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise <t...@apache.org> wrote: >> >>> My thought was to leave the asf-site branch in the beam-site repository, >>> add generated docs to that branch (until we have a better solution), and >>> have only sources in the beam repo. >>> >>> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 - >>> it would eliminate the need to place generated docs into git repos. >>> >>> On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri <eh...@google.com> wrote: >>> >>>> I believe that beam.apache.org is populated from the asf-site branch >>>> of the apache/beam-site repo. (gitpubsub: >>>> https://www.apache.org/dev/project-site.html#intro) >>>> If we move the markdown-based docs to apache/beam, leave generated >>>> javadoc and pydoc in apache/beam-site, and point gitpubsub to apache/beam, >>>> then javadoc and pydoc will not get pushed to the website. >>>> >>>> Is there some place where we can push javadoc and pydoc files? Or >>>> perhaps there an alternative way to push updates to beam.apache.org? >>>> (not requiring the asf-site branch) >>>> >>>> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise <t...@apache.org> wrote: >>>> >>>>> Hi Scott, >>>>> >>>>> Thanks for bringing the discussion back here. >>>>> >>>>> I agree that we should separate the changes for hosting of generated >>>>> java/pydocs from the rest of website automation so that we can make the >>>>> switch and fix the contributor headache soon. >>>>> >>>>> But perhaps we can avoid adding 4m lines of generated code to the main >>>>> beam repository (and keep on adding with every release) if we continue to >>>>> serve the site from the old beam-site repo? (I left a comment the doc.) >>>>> >>>>> About trying buildbot, as mentioned earlier I would be happy to help >>>>> with it. I prefer a setup that keeps the docs separate from the web site. >>>>> >>>>> Thomas >>>>> >>>>> >>>>> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner <sc...@apache.org> >>>>> wrote: >>>>> >>>>>> Re-opening this thread as it came up today in the discussion for >>>>>> PR#6458 [1]. This PR is part of the work for Beam-Site Automation >>>>>> Reliability improvements; design doc here: >>>>>> https://s.apache.org/beam-site-automation >>>>>> >>>>>> The current plan is to keep generated javadoc/pydoc sources only on >>>>>> the asf-site branch, which is necessary for the current githubpubsub >>>>>> publishing mechanism. This maintains our current approach, the only >>>>>> change >>>>>> being that we're moving the asf-site branch from the retiring >>>>>> apache/beam-site repository into a new apache/beam repo branch. >>>>>> >>>>>> The concern for committing generated content is the extra overhead >>>>>> during git fetch. I did some analysis to measure the impact [2], and >>>>>> found >>>>>> that fetching a week of source + generated content history from >>>>>> apache/beam-site took 0.39 seconds. >>>>>> >>>>>> I like the idea of publishing javadoc/pydoc snapshots to an external >>>>>> location like Flink does with buildbot, but that work is separable and >>>>>> shouldn't be a prerequisite for this effort. The goal of this work is to >>>>>> improve the reliability of automation for contributing website changes. >>>>>> At >>>>>> last measure, only about half of beam-site PR merges use Mergebot without >>>>>> experiencing some reliability issue [3]. >>>>>> >>>>>> I've opened BEAM-5459 [4] to track moving our generated docs out of >>>>>> git. Thomas, would you have bandwidth to look into this? >>>>>> >>>>>> [1] https://github.com/apache/beam/pull/6458#issuecomment-423406643 >>>>>> [2] >>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j >>>>>> [3] >>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu >>>>>> [4] https://issues.apache.org/jira/browse/BEAM-5459 >>>>>> >>>>>> On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise <t...@apache.org> wrote: >>>>>> >>>>>>> Hi Udi, >>>>>>> >>>>>>> Good to know you will continue this work. >>>>>>> >>>>>>> Let me know if you want to try the buildbot route (which does not >>>>>>> require generated documentation to be checked into the repo). Happy to >>>>>>> help >>>>>>> with that. >>>>>>> >>>>>>> Thomas >>>>>>> >>>>>>> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com> wrote: >>>>>>> >>>>>>>> I'm picking up the website migration. The plan is to not include >>>>>>>> generated files in the master branch. >>>>>>>> >>>>>>>> However, I've been told that even putting generated files a >>>>>>>> separate branch could blow up the git repository for all (e.g. make git >>>>>>>> pulls a lot longer?). >>>>>>>> Not sure if this is a real issue or not. >>>>>>>> >>>>>>>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw < >>>>>>>> rober...@google.com> wrote: >>>>>>>> >>>>>>>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <t...@apache.org> >>>>>>>>> wrote: >>>>>>>>> > >>>>>>>>> > Yes, I think the separation of generated code will need to occur >>>>>>>>> prior to completing the merge and switching the web site to the main >>>>>>>>> repo. >>>>>>>>> > >>>>>>>>> > There should be no reason to check generated documentation into >>>>>>>>> either of the repos/branches. >>>>>>>>> >>>>>>>>> Huge +1 to this. Thomas, would have time to set something like >>>>>>>>> this up >>>>>>>>> for Beam? If not, could anyone else pick this up? >>>>>>>>> >>>>>>>>> > Please see as an example how this was solved in Flink, using the >>>>>>>>> ASF buildbot infrastructure. >>>>>>>>> > >>>>>>>>> > Documentation per version/release, for example: >>>>>>>>> > >>>>>>>>> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/ >>>>>>>>> > >>>>>>>>> > The buildbot configuration is here (requires committer access): >>>>>>>>> > >>>>>>>>> > >>>>>>>>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf >>>>>>>>> > >>>>>>>>> > Thanks, >>>>>>>>> > Thomas >>>>>>>>> > >>>>>>>>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin < >>>>>>>>> mig...@google.com> wrote: >>>>>>>>> >> >>>>>>>>> >> Last time I talked with Scott I brought this idea in. I believe >>>>>>>>> the plan was either to publish compiled site to website directly, or >>>>>>>>> keep >>>>>>>>> it in separate storage from apache/beam repo. >>>>>>>>> >> >>>>>>>>> >> One of the main reasons not to check in compiled version of >>>>>>>>> website is that every developer will have to pull all the versions of >>>>>>>>> website every time they clone repo, which is not that good of an idea >>>>>>>>> to do. >>>>>>>>> >> >>>>>>>>> >> Regards, >>>>>>>>> >> --Mikhail >>>>>>>>> >> >>>>>>>>> >> Have feedback? >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com> >>>>>>>>> wrote: >>>>>>>>> >>> >>>>>>>>> >>> Pablo, the docs are generated into versioned paths, e.g., >>>>>>>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags >>>>>>>>> are not necessary? >>>>>>>>> >>> Also, once apache/beam-site is merged with apache/beam the >>>>>>>>> release branch should have the relevant docs (although perhaps it's >>>>>>>>> better >>>>>>>>> to put them in a different repo or storage system). >>>>>>>>> >>> >>>>>>>>> >>> Thomas, I would very much like to not have javadoc/pydoc >>>>>>>>> generation be part of the website review process, as it takes up a >>>>>>>>> lot of >>>>>>>>> time when changes are staged (10s of thousands of files), especially >>>>>>>>> when a >>>>>>>>> PR is updated and existing staged files need to be deleted. >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin < >>>>>>>>> mig...@google.com> wrote: >>>>>>>>> >>>> >>>>>>>>> >>>> +1 For removing old documentation. >>>>>>>>> >>>> >>>>>>>>> >>>> @Thomas: Migration work is in backlog and will be picked up >>>>>>>>> in near time. >>>>>>>>> >>>> >>>>>>>>> >>>> --Mikhail >>>>>>>>> >>>> >>>>>>>>> >>>> Have feedback? >>>>>>>>> >>>> >>>>>>>>> >>>> >>>>>>>>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <t...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>> >>>>>>>>> >>>>> +1 for removing pre 2.0 documentation (as well as the >>>>>>>>> entries from https://beam.apache.org/get-started/downloads/) >>>>>>>>> >>>>> >>>>>>>>> >>>>> Isn't it part of the beam-site changes that we will no >>>>>>>>> longer check in generated documentation into the repository? Those >>>>>>>>> can be >>>>>>>>> generated and deployed independently (when a commit to a branch >>>>>>>>> occurs), >>>>>>>>> such as done in the Apex and Flink projects. >>>>>>>>> >>>>> >>>>>>>>> >>>>> I was told that Scott who was working in the beam-site >>>>>>>>> changes is on leave now and the migration is still pending (see note >>>>>>>>> at >>>>>>>>> https://github.com/apache/beam/tree/master/website). Is anyone >>>>>>>>> else going to pick it up? >>>>>>>>> >>>>> >>>>>>>>> >>>>> Thanks, >>>>>>>>> >>>>> Thomas >>>>>>>>> >>>>> >>>>>>>>> >>>>> >>>>>>>>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada < >>>>>>>>> pabl...@google.com> wrote: >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> Is it worth adding a tag / branch to the repositories every >>>>>>>>> time we make a release, so that people are able to dive in and find >>>>>>>>> the >>>>>>>>> docs? >>>>>>>>> >>>>>> Best >>>>>>>>> >>>>>> -P. >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay < >>>>>>>>> al...@google.com> wrote: >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> I would guess that users are still using some of these old >>>>>>>>> releases. It is unclear from Beam website which releases are still >>>>>>>>> supported or not. It probably makes sense to drop documentation for >>>>>>>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the >>>>>>>>> future I >>>>>>>>> can work on updating the Beam website to clarify the state of each >>>>>>>>> release. >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri < >>>>>>>>> eh...@google.com> wrote: >>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>> The older docs are not directly linked to and are in >>>>>>>>> Github commit history. >>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>> If there are no objections I'm going to delete javadocs >>>>>>>>> and pydocs for releases older than 1 year, >>>>>>>>> >>>>>>>> meaning 2.0.0 and older (going by the dates here). >>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira < >>>>>>>>> danolive...@google.com> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> The older docs should be recorded in the commit history >>>>>>>>> of the website repository, right? If they're not currently used in the >>>>>>>>> website and they're in the commit history then I don't see a reason >>>>>>>>> to save >>>>>>>>> them. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri < >>>>>>>>> eh...@google.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>>> I'm writing a PR for apache/beam-site and >>>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because >>>>>>>>> it's >>>>>>>>> trying to deletes 22k files and then copy 22k files (warning large >>>>>>>>> file). >>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>>> It seems that we could save a lot of time by deleting >>>>>>>>> the older javadoc and pydoc files for older versions. Is there a good >>>>>>>>> reason to keep around this kind of documentation for older versions >>>>>>>>> (say 1 >>>>>>>>> year back)? >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>> -- >>>>>>>>> >>>>>> Got feedback? go/pabloem-feedback >>>>>>>>> <https://goto.google.com/pabloem-feedback> >>>>>>>>> >>>>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Got feedback? tinyurl.com/swegner-feedback >>>>>> >>>>> > > -- > > > > > Got feedback? tinyurl.com/swegner-feedback > -- Got feedback? tinyurl.com/swegner-feedback