Re-opening this thread as it came up today in the discussion for PR#6458
[1]. This PR is part of the work for Beam-Site Automation Reliability
improvements; design doc here: https://s.apache.org/beam-site-automation

The current plan is to keep generated javadoc/pydoc sources only on the
asf-site branch, which is necessary for the current githubpubsub publishing
mechanism. This maintains our current approach, the only change being that
we're moving the asf-site branch from the retiring apache/beam-site
repository into a new apache/beam repo branch.

The concern for committing generated content is the extra overhead during
git fetch. I did some analysis to measure the impact [2], and found that
fetching a week of source + generated content history from apache/beam-site
took 0.39 seconds.

I like the idea of publishing javadoc/pydoc snapshots to an external
location like Flink does with buildbot, but that work is separable and
shouldn't be a prerequisite for this effort. The goal of this work is to
improve the reliability of automation for contributing website changes. At
last measure, only about half of beam-site PR merges use Mergebot without
experiencing some reliability issue [3].

I've opened BEAM-5459 [4] to track moving our generated docs out of git.
Thomas, would you have bandwidth to look into this?

[1] https://github.com/apache/beam/pull/6458#issuecomment-423406643
[2]
https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
[3]
https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
[4] https://issues.apache.org/jira/browse/BEAM-5459

On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise <t...@apache.org> wrote:

> Hi Udi,
>
> Good to know you will continue this work.
>
> Let me know if you want to try the buildbot route (which does not require
> generated documentation to be checked into the repo). Happy to help with
> that.
>
> Thomas
>
> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com> wrote:
>
>> I'm picking up the website migration. The plan is to not include
>> generated files in the master branch.
>>
>> However, I've been told that even putting generated files a separate
>> branch could blow up the git repository for all (e.g. make git pulls a lot
>> longer?).
>> Not sure if this is a real issue or not.
>>
>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <rober...@google.com>
>> wrote:
>>
>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <t...@apache.org> wrote:
>>> >
>>> > Yes, I think the separation of generated code will need to occur prior
>>> to completing the merge and switching the web site to the main repo.
>>> >
>>> > There should be no reason to check generated documentation into either
>>> of the repos/branches.
>>>
>>> Huge +1 to this. Thomas, would have time to set something like this up
>>> for Beam? If not, could anyone else pick this up?
>>>
>>> > Please see as an example how this was solved in Flink, using the ASF
>>> buildbot infrastructure.
>>> >
>>> > Documentation per version/release, for example:
>>> >
>>> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>> >
>>> > The buildbot configuration is here (requires committer access):
>>> >
>>> >
>>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>>> >
>>> > Thanks,
>>> > Thomas
>>> >
>>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <mig...@google.com>
>>> wrote:
>>> >>
>>> >> Last time I talked with Scott I brought this idea in. I believe the
>>> plan was either to publish compiled site to website directly, or keep it in
>>> separate storage from apache/beam repo.
>>> >>
>>> >> One of the main reasons not to check in compiled version of website
>>> is that every developer will have to pull all the versions of website every
>>> time they clone repo, which is not that good of an idea to do.
>>> >>
>>> >> Regards,
>>> >> --Mikhail
>>> >>
>>> >> Have feedback?
>>> >>
>>> >>
>>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com> wrote:
>>> >>>
>>> >>> Pablo, the docs are generated into versioned paths, e.g.,
>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags are
>>> not necessary?
>>> >>> Also, once apache/beam-site is merged with apache/beam the release
>>> branch should have the relevant docs (although perhaps it's better to put
>>> them in a different repo or storage system).
>>> >>>
>>> >>> Thomas, I would very much like to not have javadoc/pydoc generation
>>> be part of the website review process, as it takes up a lot of time when
>>> changes are staged (10s of thousands of files), especially when a PR is
>>> updated and existing staged files need to be deleted.
>>> >>>
>>> >>>
>>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <mig...@google.com>
>>> wrote:
>>> >>>>
>>> >>>> +1 For removing old documentation.
>>> >>>>
>>> >>>> @Thomas: Migration work is in backlog and will be picked up in near
>>> time.
>>> >>>>
>>> >>>> --Mikhail
>>> >>>>
>>> >>>> Have feedback?
>>> >>>>
>>> >>>>
>>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <t...@apache.org>
>>> wrote:
>>> >>>>>
>>> >>>>> +1 for removing pre 2.0 documentation (as well as the entries from
>>> https://beam.apache.org/get-started/downloads/)
>>> >>>>>
>>> >>>>> Isn't it part of the beam-site changes that we will no longer
>>> check in generated documentation into the repository? Those can be
>>> generated and deployed independently (when a commit to a branch occurs),
>>> such as done in the Apex and Flink projects.
>>> >>>>>
>>> >>>>> I was told that Scott who was working in the beam-site changes is
>>> on leave now and the migration is still pending (see note at
>>> https://github.com/apache/beam/tree/master/website). Is anyone else
>>> going to pick it up?
>>> >>>>>
>>> >>>>> Thanks,
>>> >>>>> Thomas
>>> >>>>>
>>> >>>>>
>>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <pabl...@google.com>
>>> wrote:
>>> >>>>>>
>>> >>>>>> Is it worth adding a tag / branch to the repositories every time
>>> we make a release, so that people are able to dive in and find the docs?
>>> >>>>>> Best
>>> >>>>>> -P.
>>> >>>>>>
>>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <al...@google.com>
>>> wrote:
>>> >>>>>>>
>>> >>>>>>> I would guess that users are still using some of these old
>>> releases. It is unclear from Beam website which releases are still
>>> supported or not. It probably makes sense to drop documentation for
>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the future I
>>> can work on updating the Beam website to clarify the state of each release.
>>> >>>>>>>
>>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <eh...@google.com>
>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>> The older docs are not directly linked to and are in Github
>>> commit history.
>>> >>>>>>>>
>>> >>>>>>>> If there are no objections I'm going to delete javadocs and
>>> pydocs for releases older than 1 year,
>>> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
>>> >>>>>>>>
>>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>> danolive...@google.com> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>> The older docs should be recorded in the commit history of the
>>> website repository, right? If they're not currently used in the website and
>>> they're in the commit history then I don't see a reason to save them.
>>> >>>>>>>>>
>>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <eh...@google.com>
>>> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>> Hi all,
>>> >>>>>>>>>> I'm writing a PR for apache/beam-site and
>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because it's
>>> trying to deletes 22k files and then copy 22k files (warning large file).
>>> >>>>>>>>>>
>>> >>>>>>>>>> It seems that we could save a lot of time by deleting the
>>> older javadoc and pydoc files for older versions. Is there a good reason to
>>> keep around this kind of documentation for older versions (say 1 year back)?
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>> --
>>> >>>>>> Got feedback? go/pabloem-feedback
>>> <https://goto.google.com/pabloem-feedback>
>>>
>>

-- 




Got feedback? tinyurl.com/swegner-feedback

Reply via email to