I am also definitely in favor of a single repository. Perhaps I'm just
misunderstanding why the generated must be put in a git repository at
all--is it because that's the easiest way to serve them?

On Wed, Sep 26, 2018 at 3:39 PM Scott Wegner <sc...@apache.org> wrote:

> Alan found the place where website publishing is configured [1], which has
> examples of project sites being configured with more than one git root.
> This is great for us because it allows us to leave generated
> javadocs/pydocs in the beam-site repository and publish website markdown
> content from the main repo.
>
> Alan has a PR ready to publish generated HTML in a post-commit job [2].
> Once that goes through the last step is to upgrade the publishing config.
>
> [1]
> https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitwcsub/files/config/gitwcsub.cfg
> [2] https://github.com/apache/beam/pull/6431
>
> On Mon, Sep 24, 2018 at 4:35 PM Scott Wegner <sweg...@google.com> wrote:
>
>> > We could add a new default branch (master?) and keep all the
>> non-generated files (src/) there, and put generated files (content/) in the
>> asf-site branch (like we already do).
>>
>> I'm strongly in favor of having sources in a single repository. We have
>> significant process and infrastructure built up for the apache/beam repo
>> (for build, PR, CI, release, etc.) that we can take advantage of by putting
>> website sources in the same repo. The current beam-site repo PR automation
>> is flaky because it was custom-built and not given the same level of
>> attention as the main repo.
>>
>> The caveat to consolidating website sources in the main repo is that it
>> incentivizes putting the generated sources branch on the same repo. I've
>> documented a few of the reasons in the Appendix of the design doc [1]:
>>  - It's easier to maintain a single repository; easily apply existing
>> tooling/infrastructure
>> - Jenkins tooling for publishing generated HTML may not work cross-repo
>> [2]
>>
>> My preference is to move forward with the migration of sources to
>> apache/beam [master], and website generated HTML to apache/beam [asf-site].
>> I like the idea of separating the publishing/hosting of generated
>> javadocs/pydocs since they add so much cruft, but it should not hold up the
>> migration.
>>
>> [1]
>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc
>>
>> [2]
>> https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace
>>
>> On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri <eh...@google.com> wrote:
>>
>>> Staying on beam-site SGTM. We could add a new default branch (master?)
>>> and keep all the non-generated files (src/) there, and put generated files
>>> (content/) in the asf-site branch (like we already do).
>>> That way there's no confusion as to which files you should update.
>>> (This is of course assuming we still place generated docs in git repos.)
>>>
>>> On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise <t...@apache.org> wrote:
>>>
>>>> My thought was to leave the asf-site branch in the beam-site
>>>> repository, add generated docs to that branch (until we have a better
>>>> solution), and have only sources in the beam repo.
>>>>
>>>> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 -
>>>> it would eliminate the need to place generated docs into git repos.
>>>>
>>>> On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri <eh...@google.com> wrote:
>>>>
>>>>> I believe that beam.apache.org is populated from the asf-site branch
>>>>> of the apache/beam-site repo. (gitpubsub:
>>>>> https://www.apache.org/dev/project-site.html#intro)
>>>>> If we move the markdown-based docs to apache/beam, leave generated
>>>>> javadoc and pydoc in apache/beam-site, and point gitpubsub to apache/beam,
>>>>> then javadoc and pydoc will not get pushed to the website.
>>>>>
>>>>> Is there some place where we can push javadoc and pydoc files? Or
>>>>> perhaps there an alternative way to push updates to beam.apache.org?
>>>>> (not requiring the asf-site branch)
>>>>>
>>>>> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise <t...@apache.org> wrote:
>>>>>
>>>>>> Hi Scott,
>>>>>>
>>>>>> Thanks for bringing the discussion back here.
>>>>>>
>>>>>> I agree that we should separate the changes for hosting of generated
>>>>>> java/pydocs from the rest of website automation so that we can make the
>>>>>> switch and fix the contributor headache soon.
>>>>>>
>>>>>> But perhaps we can avoid adding 4m lines of generated code to the
>>>>>> main beam repository (and keep on adding with every release) if we 
>>>>>> continue
>>>>>> to serve the site from the old beam-site repo? (I left a comment the 
>>>>>> doc.)
>>>>>>
>>>>>> About trying buildbot, as mentioned earlier I would be happy to help
>>>>>> with it. I prefer a setup that keeps the docs separate from the web site.
>>>>>>
>>>>>> Thomas
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner <sc...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Re-opening this thread as it came up today in the discussion for
>>>>>>> PR#6458 [1]. This PR is part of the work for Beam-Site Automation
>>>>>>> Reliability improvements; design doc here:
>>>>>>> https://s.apache.org/beam-site-automation
>>>>>>>
>>>>>>> The current plan is to keep generated javadoc/pydoc sources only on
>>>>>>> the asf-site branch, which is necessary for the current githubpubsub
>>>>>>> publishing mechanism. This maintains our current approach, the only 
>>>>>>> change
>>>>>>> being that we're moving the asf-site branch from the retiring
>>>>>>> apache/beam-site repository into a new apache/beam repo branch.
>>>>>>>
>>>>>>> The concern for committing generated content is the extra overhead
>>>>>>> during git fetch. I did some analysis to measure the impact [2], and 
>>>>>>> found
>>>>>>> that fetching a week of source + generated content history from
>>>>>>> apache/beam-site took 0.39 seconds.
>>>>>>>
>>>>>>> I like the idea of publishing javadoc/pydoc snapshots to an external
>>>>>>> location like Flink does with buildbot, but that work is separable and
>>>>>>> shouldn't be a prerequisite for this effort. The goal of this work is to
>>>>>>> improve the reliability of automation for contributing website changes. 
>>>>>>> At
>>>>>>> last measure, only about half of beam-site PR merges use Mergebot 
>>>>>>> without
>>>>>>> experiencing some reliability issue [3].
>>>>>>>
>>>>>>> I've opened BEAM-5459 [4] to track moving our generated docs out of
>>>>>>> git. Thomas, would you have bandwidth to look into this?
>>>>>>>
>>>>>>> [1] https://github.com/apache/beam/pull/6458#issuecomment-423406643
>>>>>>> [2]
>>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
>>>>>>> [3]
>>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
>>>>>>> [4] https://issues.apache.org/jira/browse/BEAM-5459
>>>>>>>
>>>>>>> On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise <t...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Udi,
>>>>>>>>
>>>>>>>> Good to know you will continue this work.
>>>>>>>>
>>>>>>>> Let me know if you want to try the buildbot route (which does not
>>>>>>>> require generated documentation to be checked into the repo). Happy to 
>>>>>>>> help
>>>>>>>> with that.
>>>>>>>>
>>>>>>>> Thomas
>>>>>>>>
>>>>>>>> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I'm picking up the website migration. The plan is to not include
>>>>>>>>> generated files in the master branch.
>>>>>>>>>
>>>>>>>>> However, I've been told that even putting generated files a
>>>>>>>>> separate branch could blow up the git repository for all (e.g. make 
>>>>>>>>> git
>>>>>>>>> pulls a lot longer?).
>>>>>>>>> Not sure if this is a real issue or not.
>>>>>>>>>
>>>>>>>>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <
>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <t...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>> >
>>>>>>>>>> > Yes, I think the separation of generated code will need to
>>>>>>>>>> occur prior to completing the merge and switching the web site to 
>>>>>>>>>> the main
>>>>>>>>>> repo.
>>>>>>>>>> >
>>>>>>>>>> > There should be no reason to check generated documentation into
>>>>>>>>>> either of the repos/branches.
>>>>>>>>>>
>>>>>>>>>> Huge +1 to this. Thomas, would have time to set something like
>>>>>>>>>> this up
>>>>>>>>>> for Beam? If not, could anyone else pick this up?
>>>>>>>>>>
>>>>>>>>>> > Please see as an example how this was solved in Flink, using
>>>>>>>>>> the ASF buildbot infrastructure.
>>>>>>>>>> >
>>>>>>>>>> > Documentation per version/release, for example:
>>>>>>>>>> >
>>>>>>>>>> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>>>>>>>>> >
>>>>>>>>>> > The buildbot configuration is here (requires committer access):
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>>>>>>>>>> >
>>>>>>>>>> > Thanks,
>>>>>>>>>> > Thomas
>>>>>>>>>> >
>>>>>>>>>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <
>>>>>>>>>> mig...@google.com> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >> Last time I talked with Scott I brought this idea in. I
>>>>>>>>>> believe the plan was either to publish compiled site to website 
>>>>>>>>>> directly,
>>>>>>>>>> or keep it in separate storage from apache/beam repo.
>>>>>>>>>> >>
>>>>>>>>>> >> One of the main reasons not to check in compiled version of
>>>>>>>>>> website is that every developer will have to pull all the versions of
>>>>>>>>>> website every time they clone repo, which is not that good of an 
>>>>>>>>>> idea to do.
>>>>>>>>>> >>
>>>>>>>>>> >> Regards,
>>>>>>>>>> >> --Mikhail
>>>>>>>>>> >>
>>>>>>>>>> >> Have feedback?
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>> >>>
>>>>>>>>>> >>> Pablo, the docs are generated into versioned paths, e.g.,
>>>>>>>>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so
>>>>>>>>>> tags are not necessary?
>>>>>>>>>> >>> Also, once apache/beam-site is merged with apache/beam the
>>>>>>>>>> release branch should have the relevant docs (although perhaps it's 
>>>>>>>>>> better
>>>>>>>>>> to put them in a different repo or storage system).
>>>>>>>>>> >>>
>>>>>>>>>> >>> Thomas, I would very much like to not have javadoc/pydoc
>>>>>>>>>> generation be part of the website review process, as it takes up a 
>>>>>>>>>> lot of
>>>>>>>>>> time when changes are staged (10s of thousands of files), especially 
>>>>>>>>>> when a
>>>>>>>>>> PR is updated and existing staged files need to be deleted.
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <
>>>>>>>>>> mig...@google.com> wrote:
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> +1 For removing old documentation.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> @Thomas: Migration work is in backlog and will be picked up
>>>>>>>>>> in near time.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> --Mikhail
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> Have feedback?
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <t...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> +1 for removing pre 2.0 documentation (as well as the
>>>>>>>>>> entries from https://beam.apache.org/get-started/downloads/)
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Isn't it part of the beam-site changes that we will no
>>>>>>>>>> longer check in generated documentation into the repository? Those 
>>>>>>>>>> can be
>>>>>>>>>> generated and deployed independently (when a commit to a branch 
>>>>>>>>>> occurs),
>>>>>>>>>> such as done in the Apex and Flink projects.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> I was told that Scott who was working in the beam-site
>>>>>>>>>> changes is on leave now and the migration is still pending (see note 
>>>>>>>>>> at
>>>>>>>>>> https://github.com/apache/beam/tree/master/website). Is anyone
>>>>>>>>>> else going to pick it up?
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Thanks,
>>>>>>>>>> >>>>> Thomas
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <
>>>>>>>>>> pabl...@google.com> wrote:
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> Is it worth adding a tag / branch to the repositories
>>>>>>>>>> every time we make a release, so that people are able to dive in and 
>>>>>>>>>> find
>>>>>>>>>> the docs?
>>>>>>>>>> >>>>>> Best
>>>>>>>>>> >>>>>> -P.
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <
>>>>>>>>>> al...@google.com> wrote:
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> I would guess that users are still using some of these
>>>>>>>>>> old releases. It is unclear from Beam website which releases are 
>>>>>>>>>> still
>>>>>>>>>> supported or not. It probably makes sense to drop documentation for
>>>>>>>>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the 
>>>>>>>>>> future I
>>>>>>>>>> can work on updating the Beam website to clarify the state of each 
>>>>>>>>>> release.
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <
>>>>>>>>>> eh...@google.com> wrote:
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> The older docs are not directly linked to and are in
>>>>>>>>>> Github commit history.
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> If there are no objections I'm going to delete javadocs
>>>>>>>>>> and pydocs for releases older than 1 year,
>>>>>>>>>> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>>>>>>> danolive...@google.com> wrote:
>>>>>>>>>> >>>>>>>>>
>>>>>>>>>> >>>>>>>>> The older docs should be recorded in the commit history
>>>>>>>>>> of the website repository, right? If they're not currently used in 
>>>>>>>>>> the
>>>>>>>>>> website and they're in the commit history then I don't see a reason 
>>>>>>>>>> to save
>>>>>>>>>> them.
>>>>>>>>>> >>>>>>>>>
>>>>>>>>>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <
>>>>>>>>>> eh...@google.com> wrote:
>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>> >>>>>>>>>> Hi all,
>>>>>>>>>> >>>>>>>>>> I'm writing a PR for apache/beam-site and
>>>>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, 
>>>>>>>>>> because it's
>>>>>>>>>> trying to deletes 22k files and then copy 22k files (warning large 
>>>>>>>>>> file).
>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>> >>>>>>>>>> It seems that we could save a lot of time by deleting
>>>>>>>>>> the older javadoc and pydoc files for older versions. Is there a good
>>>>>>>>>> reason to keep around this kind of documentation for older versions 
>>>>>>>>>> (say 1
>>>>>>>>>> year back)?
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>> --
>>>>>>>>>> >>>>>> Got feedback? go/pabloem-feedback
>>>>>>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>>>
>>>>>>
>>
>> --
>>
>>
>>
>>
>> Got feedback? tinyurl.com/swegner-feedback
>>
>
>
> --
>
>
>
>
> Got feedback? tinyurl.com/swegner-feedback
>

Reply via email to