I am also definitely in favor of a single repository. Perhaps I'm just misunderstanding why the generated must be put in a git repository at all--is it because that's the easiest way to serve them?
On Wed, Sep 26, 2018 at 3:39 PM Scott Wegner <sc...@apache.org> wrote: > Alan found the place where website publishing is configured [1], which has > examples of project sites being configured with more than one git root. > This is great for us because it allows us to leave generated > javadocs/pydocs in the beam-site repository and publish website markdown > content from the main repo. > > Alan has a PR ready to publish generated HTML in a post-commit job [2]. > Once that goes through the last step is to upgrade the publishing config. > > [1] > https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitwcsub/files/config/gitwcsub.cfg > [2] https://github.com/apache/beam/pull/6431 > > On Mon, Sep 24, 2018 at 4:35 PM Scott Wegner <sweg...@google.com> wrote: > >> > We could add a new default branch (master?) and keep all the >> non-generated files (src/) there, and put generated files (content/) in the >> asf-site branch (like we already do). >> >> I'm strongly in favor of having sources in a single repository. We have >> significant process and infrastructure built up for the apache/beam repo >> (for build, PR, CI, release, etc.) that we can take advantage of by putting >> website sources in the same repo. The current beam-site repo PR automation >> is flaky because it was custom-built and not given the same level of >> attention as the main repo. >> >> The caveat to consolidating website sources in the main repo is that it >> incentivizes putting the generated sources branch on the same repo. I've >> documented a few of the reasons in the Appendix of the design doc [1]: >> - It's easier to maintain a single repository; easily apply existing >> tooling/infrastructure >> - Jenkins tooling for publishing generated HTML may not work cross-repo >> [2] >> >> My preference is to move forward with the migration of sources to >> apache/beam [master], and website generated HTML to apache/beam [asf-site]. >> I like the idea of separating the publishing/hosting of generated >> javadocs/pydocs since they add so much cruft, but it should not hold up the >> migration. >> >> [1] >> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc >> >> [2] >> https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace >> >> On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri <eh...@google.com> wrote: >> >>> Staying on beam-site SGTM. We could add a new default branch (master?) >>> and keep all the non-generated files (src/) there, and put generated files >>> (content/) in the asf-site branch (like we already do). >>> That way there's no confusion as to which files you should update. >>> (This is of course assuming we still place generated docs in git repos.) >>> >>> On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise <t...@apache.org> wrote: >>> >>>> My thought was to leave the asf-site branch in the beam-site >>>> repository, add generated docs to that branch (until we have a better >>>> solution), and have only sources in the beam repo. >>>> >>>> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 - >>>> it would eliminate the need to place generated docs into git repos. >>>> >>>> On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri <eh...@google.com> wrote: >>>> >>>>> I believe that beam.apache.org is populated from the asf-site branch >>>>> of the apache/beam-site repo. (gitpubsub: >>>>> https://www.apache.org/dev/project-site.html#intro) >>>>> If we move the markdown-based docs to apache/beam, leave generated >>>>> javadoc and pydoc in apache/beam-site, and point gitpubsub to apache/beam, >>>>> then javadoc and pydoc will not get pushed to the website. >>>>> >>>>> Is there some place where we can push javadoc and pydoc files? Or >>>>> perhaps there an alternative way to push updates to beam.apache.org? >>>>> (not requiring the asf-site branch) >>>>> >>>>> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise <t...@apache.org> wrote: >>>>> >>>>>> Hi Scott, >>>>>> >>>>>> Thanks for bringing the discussion back here. >>>>>> >>>>>> I agree that we should separate the changes for hosting of generated >>>>>> java/pydocs from the rest of website automation so that we can make the >>>>>> switch and fix the contributor headache soon. >>>>>> >>>>>> But perhaps we can avoid adding 4m lines of generated code to the >>>>>> main beam repository (and keep on adding with every release) if we >>>>>> continue >>>>>> to serve the site from the old beam-site repo? (I left a comment the >>>>>> doc.) >>>>>> >>>>>> About trying buildbot, as mentioned earlier I would be happy to help >>>>>> with it. I prefer a setup that keeps the docs separate from the web site. >>>>>> >>>>>> Thomas >>>>>> >>>>>> >>>>>> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner <sc...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Re-opening this thread as it came up today in the discussion for >>>>>>> PR#6458 [1]. This PR is part of the work for Beam-Site Automation >>>>>>> Reliability improvements; design doc here: >>>>>>> https://s.apache.org/beam-site-automation >>>>>>> >>>>>>> The current plan is to keep generated javadoc/pydoc sources only on >>>>>>> the asf-site branch, which is necessary for the current githubpubsub >>>>>>> publishing mechanism. This maintains our current approach, the only >>>>>>> change >>>>>>> being that we're moving the asf-site branch from the retiring >>>>>>> apache/beam-site repository into a new apache/beam repo branch. >>>>>>> >>>>>>> The concern for committing generated content is the extra overhead >>>>>>> during git fetch. I did some analysis to measure the impact [2], and >>>>>>> found >>>>>>> that fetching a week of source + generated content history from >>>>>>> apache/beam-site took 0.39 seconds. >>>>>>> >>>>>>> I like the idea of publishing javadoc/pydoc snapshots to an external >>>>>>> location like Flink does with buildbot, but that work is separable and >>>>>>> shouldn't be a prerequisite for this effort. The goal of this work is to >>>>>>> improve the reliability of automation for contributing website changes. >>>>>>> At >>>>>>> last measure, only about half of beam-site PR merges use Mergebot >>>>>>> without >>>>>>> experiencing some reliability issue [3]. >>>>>>> >>>>>>> I've opened BEAM-5459 [4] to track moving our generated docs out of >>>>>>> git. Thomas, would you have bandwidth to look into this? >>>>>>> >>>>>>> [1] https://github.com/apache/beam/pull/6458#issuecomment-423406643 >>>>>>> [2] >>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j >>>>>>> [3] >>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu >>>>>>> [4] https://issues.apache.org/jira/browse/BEAM-5459 >>>>>>> >>>>>>> On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise <t...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Udi, >>>>>>>> >>>>>>>> Good to know you will continue this work. >>>>>>>> >>>>>>>> Let me know if you want to try the buildbot route (which does not >>>>>>>> require generated documentation to be checked into the repo). Happy to >>>>>>>> help >>>>>>>> with that. >>>>>>>> >>>>>>>> Thomas >>>>>>>> >>>>>>>> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I'm picking up the website migration. The plan is to not include >>>>>>>>> generated files in the master branch. >>>>>>>>> >>>>>>>>> However, I've been told that even putting generated files a >>>>>>>>> separate branch could blow up the git repository for all (e.g. make >>>>>>>>> git >>>>>>>>> pulls a lot longer?). >>>>>>>>> Not sure if this is a real issue or not. >>>>>>>>> >>>>>>>>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw < >>>>>>>>> rober...@google.com> wrote: >>>>>>>>> >>>>>>>>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <t...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>> > >>>>>>>>>> > Yes, I think the separation of generated code will need to >>>>>>>>>> occur prior to completing the merge and switching the web site to >>>>>>>>>> the main >>>>>>>>>> repo. >>>>>>>>>> > >>>>>>>>>> > There should be no reason to check generated documentation into >>>>>>>>>> either of the repos/branches. >>>>>>>>>> >>>>>>>>>> Huge +1 to this. Thomas, would have time to set something like >>>>>>>>>> this up >>>>>>>>>> for Beam? If not, could anyone else pick this up? >>>>>>>>>> >>>>>>>>>> > Please see as an example how this was solved in Flink, using >>>>>>>>>> the ASF buildbot infrastructure. >>>>>>>>>> > >>>>>>>>>> > Documentation per version/release, for example: >>>>>>>>>> > >>>>>>>>>> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/ >>>>>>>>>> > >>>>>>>>>> > The buildbot configuration is here (requires committer access): >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf >>>>>>>>>> > >>>>>>>>>> > Thanks, >>>>>>>>>> > Thomas >>>>>>>>>> > >>>>>>>>>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin < >>>>>>>>>> mig...@google.com> wrote: >>>>>>>>>> >> >>>>>>>>>> >> Last time I talked with Scott I brought this idea in. I >>>>>>>>>> believe the plan was either to publish compiled site to website >>>>>>>>>> directly, >>>>>>>>>> or keep it in separate storage from apache/beam repo. >>>>>>>>>> >> >>>>>>>>>> >> One of the main reasons not to check in compiled version of >>>>>>>>>> website is that every developer will have to pull all the versions of >>>>>>>>>> website every time they clone repo, which is not that good of an >>>>>>>>>> idea to do. >>>>>>>>>> >> >>>>>>>>>> >> Regards, >>>>>>>>>> >> --Mikhail >>>>>>>>>> >> >>>>>>>>>> >> Have feedback? >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com> >>>>>>>>>> wrote: >>>>>>>>>> >>> >>>>>>>>>> >>> Pablo, the docs are generated into versioned paths, e.g., >>>>>>>>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so >>>>>>>>>> tags are not necessary? >>>>>>>>>> >>> Also, once apache/beam-site is merged with apache/beam the >>>>>>>>>> release branch should have the relevant docs (although perhaps it's >>>>>>>>>> better >>>>>>>>>> to put them in a different repo or storage system). >>>>>>>>>> >>> >>>>>>>>>> >>> Thomas, I would very much like to not have javadoc/pydoc >>>>>>>>>> generation be part of the website review process, as it takes up a >>>>>>>>>> lot of >>>>>>>>>> time when changes are staged (10s of thousands of files), especially >>>>>>>>>> when a >>>>>>>>>> PR is updated and existing staged files need to be deleted. >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin < >>>>>>>>>> mig...@google.com> wrote: >>>>>>>>>> >>>> >>>>>>>>>> >>>> +1 For removing old documentation. >>>>>>>>>> >>>> >>>>>>>>>> >>>> @Thomas: Migration work is in backlog and will be picked up >>>>>>>>>> in near time. >>>>>>>>>> >>>> >>>>>>>>>> >>>> --Mikhail >>>>>>>>>> >>>> >>>>>>>>>> >>>> Have feedback? >>>>>>>>>> >>>> >>>>>>>>>> >>>> >>>>>>>>>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <t...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> +1 for removing pre 2.0 documentation (as well as the >>>>>>>>>> entries from https://beam.apache.org/get-started/downloads/) >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> Isn't it part of the beam-site changes that we will no >>>>>>>>>> longer check in generated documentation into the repository? Those >>>>>>>>>> can be >>>>>>>>>> generated and deployed independently (when a commit to a branch >>>>>>>>>> occurs), >>>>>>>>>> such as done in the Apex and Flink projects. >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> I was told that Scott who was working in the beam-site >>>>>>>>>> changes is on leave now and the migration is still pending (see note >>>>>>>>>> at >>>>>>>>>> https://github.com/apache/beam/tree/master/website). Is anyone >>>>>>>>>> else going to pick it up? >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> Thanks, >>>>>>>>>> >>>>> Thomas >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada < >>>>>>>>>> pabl...@google.com> wrote: >>>>>>>>>> >>>>>> >>>>>>>>>> >>>>>> Is it worth adding a tag / branch to the repositories >>>>>>>>>> every time we make a release, so that people are able to dive in and >>>>>>>>>> find >>>>>>>>>> the docs? >>>>>>>>>> >>>>>> Best >>>>>>>>>> >>>>>> -P. >>>>>>>>>> >>>>>> >>>>>>>>>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay < >>>>>>>>>> al...@google.com> wrote: >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> I would guess that users are still using some of these >>>>>>>>>> old releases. It is unclear from Beam website which releases are >>>>>>>>>> still >>>>>>>>>> supported or not. It probably makes sense to drop documentation for >>>>>>>>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the >>>>>>>>>> future I >>>>>>>>>> can work on updating the Beam website to clarify the state of each >>>>>>>>>> release. >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri < >>>>>>>>>> eh...@google.com> wrote: >>>>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> The older docs are not directly linked to and are in >>>>>>>>>> Github commit history. >>>>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> If there are no objections I'm going to delete javadocs >>>>>>>>>> and pydocs for releases older than 1 year, >>>>>>>>>> >>>>>>>> meaning 2.0.0 and older (going by the dates here). >>>>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira < >>>>>>>>>> danolive...@google.com> wrote: >>>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> The older docs should be recorded in the commit history >>>>>>>>>> of the website repository, right? If they're not currently used in >>>>>>>>>> the >>>>>>>>>> website and they're in the commit history then I don't see a reason >>>>>>>>>> to save >>>>>>>>>> them. >>>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri < >>>>>>>>>> eh...@google.com> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> I'm writing a PR for apache/beam-site and >>>>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, >>>>>>>>>> because it's >>>>>>>>>> trying to deletes 22k files and then copy 22k files (warning large >>>>>>>>>> file). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> It seems that we could save a lot of time by deleting >>>>>>>>>> the older javadoc and pydoc files for older versions. Is there a good >>>>>>>>>> reason to keep around this kind of documentation for older versions >>>>>>>>>> (say 1 >>>>>>>>>> year back)? >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>> -- >>>>>>>>>> >>>>>> Got feedback? go/pabloem-feedback >>>>>>>>>> <https://goto.google.com/pabloem-feedback> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Got feedback? tinyurl.com/swegner-feedback >>>>>>> >>>>>> >> >> -- >> >> >> >> >> Got feedback? tinyurl.com/swegner-feedback >> > > > -- > > > > > Got feedback? tinyurl.com/swegner-feedback >