The new repo is reduced by 100MB:

$ gh repo clone apache/pulsar -- --single-branch --depth=1
$ du -sh pulsar | sort -rh
 53M pulsar

Best,
tison.


tison <wander4...@gmail.com> 于2022年12月29日周四 21:01写道:

> Landed.
>
> Best,
> tison.
>
>
> tison <wander4...@gmail.com> 于2022年12月29日周四 17:51写道:
>
>> Here are the related PRs:
>>
>> * https://github.com/apache/pulsar/pull/19100
>> * https://github.com/apache/pulsar-site/pull/348
>>
>> Best,
>> tison.
>>
>>
>> tison <wander4...@gmail.com> 于2022年12月26日周一 21:45写道:
>>
>>> FYI tracking issue has been created:
>>> https://github.com/apache/pulsar/issues/19064
>>>
>>> I plan to finish it by the end of next month.
>>>
>>> Best,
>>> tison.
>>>
>>>
>>> tison <wander4...@gmail.com> 于2022年12月21日周三 11:33写道:
>>>
>>>> Thanks for your feedback!
>>>>
>>>> @Yu
>>>>
>>>> Thanks for sharing the previous thread. I looped in @michaeljmarshall
>>>> here.
>>>>
>>>> @Jun
>>>>
>>>> It's possible but causes a new shortcoming: Now you should tell the
>>>> contributor that the versioned docs are different from the NEXT version
>>>> docs, lol.
>>>>
>>>> If our developers don't complain about these separated sources. Like @Asaf
>>>> comment:
>>>>
>>>> > We can take, let's say, five features and see if they were actually
>>>> done in
>>>> > the same PR or separate PR. I guess that most documentation is
>>>> actually
>>>> > updated separately. Thus, from that perspective, maybe it’s not a con.
>>>>
>>>> Then we can do this refactor thoroughgoing.
>>>>
>>>> Also, if we keep, somehow several sources in the main repo. We still
>>>> have shortcomings:
>>>>
>>>> 1. Duplicated CI workflows.
>>>> 2. Cumbersome preview scaffolding in the main repo.
>>>>
>>>> ... which is the original purpose I'd like to overcome.
>>>>
>>>> Best,
>>>> tison.
>>>>
>>>>
>>>> Jun Ma <momoma...@hotmail.com> 于2022年12月21日周三 11:19写道:
>>>>
>>>>> Is it possible to come up with a compromised solution that has the
>>>>> pros of both sides but minimizes the side effect? I'm thinking maybe it's
>>>>> not necessary to sacrifice the current contribution process, as long as it
>>>>> can greatly reduce the load of back-end actions and source size. For
>>>>> example, if we only move out the versioned docs to the site repo but keep
>>>>> the source of the NEXT docs in the pulsar repo, does this help to win a
>>>>> large proportion of those pros when people can still contribute as usual?
>>>>>
>>>>> ________________________________
>>>>> From: Jiaqi Shen <gleiphir2...@gmail.com>
>>>>> Sent: Tuesday, December 20, 2022 17:15
>>>>> To: dev@pulsar.apache.org <dev@pulsar.apache.org>
>>>>> Subject: Re: [PROPOSAL] Website precommit and move the source of docs
>>>>> to the site repo
>>>>>
>>>>> +1, it makes sense to me.
>>>>>
>>>>> Thanks,
>>>>> Jiaqi Shen
>>>>>
>>>>>
>>>>> Yu <li...@apache.org> 于2022年12月19日周一 20:57写道:
>>>>>
>>>>> > Hi tison,
>>>>> >
>>>>> > Thanks for raising this up!
>>>>> >
>>>>> > Our community had a similar discussion previously and chose to
>>>>> "keep" the
>>>>> > doc repo stay in the Pulsar main repo at that time.
>>>>> >
>>>>> > [1] lists the pros and cons of "keep" and "not keep" solutions.
>>>>> >
>>>>> > I'm +0 on this proposal because I think the total scores of these two
>>>>> > solutions are almost equal after weighing the pros and cons.
>>>>> >
>>>>> > ~~~~~~~~~~~~~~~~~~~~
>>>>> >
>>>>> > [1] https://lists.apache.org/thread/mf2xwntfgn84dq78ksqv22jk3drq6xb3
>>>>> >
>>>>> >
>>>>> > On Mon, Dec 19, 2022 at 5:40 PM tison <wander4...@gmail.com> wrote:
>>>>> >
>>>>> > > Thanks for your feedback!
>>>>> > >
>>>>> > > @Asaf
>>>>> > >
>>>>> > > > pre-commit
>>>>> > >
>>>>> > > I mean CI checks before merging a patch. Currently, we don't run
>>>>> checks
>>>>> > for
>>>>> > > the content before merging them. This causes a series of syntax
>>>>> errors
>>>>> > and
>>>>> > > broken links issues. If we hold docs under site2 folder in the
>>>>> main repo
>>>>> > > and then copied to the site repo, we have two places to build such
>>>>> CI
>>>>> > > checks. What's worse, the checks for the main repo will be quite
>>>>> > > cumbersome (that you do some if-else logic in the whole Pulsar CI
>>>>> > > workflows, and do the sync sequentially in that workflow).
>>>>> > >
>>>>> > > If we hold the source of docs only in the site repo, we can extend
>>>>> the
>>>>> > > "precommit" workflow[1] I added recently to check for syntax
>>>>> errors and
>>>>> > > broken links also.
>>>>> > >
>>>>> > > > What does the apache/pulsar-site repo contain today?
>>>>> > >
>>>>> > > It should be covered by the documentation guide page[2]. It holds
>>>>> the
>>>>> > > source of the official website and the user docs are synced from
>>>>> the main
>>>>> > > repo.
>>>>> > >
>>>>> > > > What content do we have today in the pulsar repo related to the
>>>>> site?
>>>>> > >
>>>>> > > After issue-18014[3] is done, we host only user docs and some JSON
>>>>> > metadata
>>>>> > > in the main repo, which is synced by site_syncer.py[4].
>>>>> > >
>>>>> > > > Can you explain that better? Are you saying pulsar source JARs
>>>>> contain
>>>>> > > the documentation?
>>>>> > >
>>>>> > > No. Source JARs contain only the Java files and necessary
>>>>> copyrights
>>>>> > info.
>>>>> > > The source release is, for example,
>>>>> > >
>>>>> > >
>>>>> >
>>>>> https://archive.apache.org/dist/pulsar/pulsar-2.10.2/apache-pulsar-2.10.2-src.tar.gz
>>>>> > > ,
>>>>> > > which is extracted to 173M where 129M is occupied by the site2
>>>>> folder.
>>>>> > >
>>>>> > > This also affects when developers do git clone to clone the repo.
>>>>> > >
>>>>> > > > I mean, if you wish to document a bug fix in 2.9.x, for example,
>>>>> would
>>>>> > > you do it in the 2.9.x branch under site2/docs or
>>>>> > > site2/website/versioned_docs/2.9.5?
>>>>> > >
>>>>> > > This is another question. Ideally, we should have hosted versioned
>>>>> docs
>>>>> > > associated with the specific version to that branch, like Apache
>>>>> Flink
>>>>> > does
>>>>> > > as I mentioned[5]. But we do not, and actually the situation is we
>>>>> update
>>>>> > > the versioned docs under the master branch and thus, the docs can
>>>>> be
>>>>> > synced
>>>>> > > properly.
>>>>> > >
>>>>> > > See also the "Alternatives" section in the original email.
>>>>> > >
>>>>> > > @All
>>>>> > >
>>>>> > > Since we don't have objections to the possible cons listed above
>>>>> or any
>>>>> > new
>>>>> > > ones, I'm going to create a tracking issue later this week and
>>>>> show what
>>>>> > > will be changed in PRs for further review.
>>>>> > >
>>>>> > > Best,
>>>>> > > tison.
>>>>> > >
>>>>> > > [1]
>>>>> > >
>>>>> > >
>>>>> >
>>>>> https://github.com/apache/pulsar-site/blob/f7abc615d57d9846ed093922d24bff952dc0e838/.github/workflows/ci-precommit.yml
>>>>> > > [2]
>>>>> > >
>>>>> > >
>>>>> >
>>>>> https://pulsar.apache.org/contribute/document-contribution/#source-repositories
>>>>> > > [3] https://github.com/apache/pulsar/issues/18014
>>>>> > > [4]
>>>>> > >
>>>>> > >
>>>>> >
>>>>> https://github.com/apache/pulsar-site/blob/f7abc615d57d9846ed093922d24bff952dc0e838/tools/pytools/lib/execute/site_syncer.py
>>>>> > > [5] https://github.com/apache/flink/tree/master/docs
>>>>> > >
>>>>> > >
>>>>> > > PengHui Li <peng...@apache.org> 于2022年12月19日周一 16:26写道:
>>>>> > >
>>>>> > > > +1
>>>>> > > >
>>>>> > > > I support moving them to the website repo.
>>>>> > > >
>>>>> > > > Thanks,
>>>>> > > > Penghui
>>>>> > > >
>>>>> > > > On Mon, Dec 19, 2022 at 12:04 PM Yunze Xu
>>>>> <y...@streamnative.io.invalid
>>>>> > >
>>>>> > > > wrote:
>>>>> > > >
>>>>> > > > > +1. The most significant point to me is that we can preview
>>>>> all the
>>>>> > > > > content of the website without synchronizing contents from the
>>>>> > > > > apache/pulsar repo.
>>>>> > > > >
>>>>> > > > > Thanks,
>>>>> > > > > Yunze
>>>>> > > > >
>>>>> > > > > On Mon, Dec 19, 2022 at 9:53 AM Li Li <urf...@apache.org>
>>>>> wrote:
>>>>> > > > > >
>>>>> > > > > > +1, That’s a good idea.
>>>>> > > > > >
>>>>> > > > > > > On Dec 16, 2022, at 07:07, tison <wander4...@gmail.com>
>>>>> wrote:
>>>>> > > > > > >
>>>>> > > > > > > Hi,
>>>>> > > > > > >
>>>>> > > > > > > After several works around the build flow of our official
>>>>> > > > > website[1][2][3],
>>>>> > > > > > > the content sync and site build flow is debuggable and
>>>>> > reproducible
>>>>> > > > > now.
>>>>> > > > > > >
>>>>> > > > > > > However, compared to other Apache projects' websites'
>>>>> project
>>>>> > > layouts
>>>>> > > > > and
>>>>> > > > > > > workflow, we still meet two challenges on the Pulsar site:
>>>>> > > > > > >
>>>>> > > > > > > 1. We don't have a pre-commit workflow for any
>>>>> website-related
>>>>> > > > changes.
>>>>> > > > > > > Thus, we don't detect broken links or syntax errors when
>>>>> > reviewing
>>>>> > > > new
>>>>> > > > > > > patches[4][5][6].
>>>>> > > > > > > 2. The website's content is two-level down in
>>>>> > `site2/website-next`
>>>>> > > > for
>>>>> > > > > > > historical reasons, which is confusing for contributors.
>>>>> > > > > > >
>>>>> > > > > > > To overcome these two shortcomings, I propose the
>>>>> following:
>>>>> > > > > > >
>>>>> > > > > > > 1. Move the website's content to the root level, then we
>>>>> have a
>>>>> > > > > first-class
>>>>> > > > > > > Docu&yarn-based JS project layout. It's more convenient and
>>>>> > > familiar
>>>>> > > > to
>>>>> > > > > > > related developers.
>>>>> > > > > > > 2. Host the source of docs in the site repo
>>>>> (apache/pulsar-site)
>>>>> > > > > instead of
>>>>> > > > > > > under `site2` folder in the main repo and do content sync.
>>>>> > > > > > >
>>>>> > > > > > > Below are the pros and cons:
>>>>> > > > > > >
>>>>> > > > > > > Pros
>>>>> > > > > > >
>>>>> > > > > > > 1. Obviously, we have the pre-commit workflow now. And
>>>>> since we
>>>>> > > host
>>>>> > > > > the
>>>>> > > > > > > source of docs in one repo, we don't have to run the
>>>>> pre-commit
>>>>> > > > > workflow in
>>>>> > > > > > > two places, which can be quite cumbersome to implement.
>>>>> > > > > > > 2. The size of the source release of the main repo can be
>>>>> > reduced a
>>>>> > > > > lot.
>>>>> > > > > > > Currently, 63MB out of 140MB of the sources are taken by
>>>>> the
>>>>> > site2
>>>>> > > > > folder,
>>>>> > > > > > > which we can remove totally. In addition, we carry out
>>>>> > > full-versioned
>>>>> > > > > docs
>>>>> > > > > > > every release.
>>>>> > > > > > > 3. We can clean up a large portion of "integration" to
>>>>> debug the
>>>>> > > site
>>>>> > > > > > > brittlely on the main repo[7]  (etc.) and redundant
>>>>> contribution
>>>>> > > > > guide[8].
>>>>> > > > > > > This way, when updating docs, we can preview the result in
>>>>> one
>>>>> > repo
>>>>> > > > > instead
>>>>> > > > > > > of actually doing the sync on the fly. In addition, this
>>>>> > > integration
>>>>> > > > > blocks
>>>>> > > > > > > we move the website content to the top level since it makes
>>>>> > strong
>>>>> > > > > > > assumptions about the relative layout.
>>>>> > > > > > >
>>>>> > > > > > > Cons
>>>>> > > > > > >
>>>>> > > > > > > The most significant con is that we cannot update the code
>>>>> and
>>>>> > docs
>>>>> > > > in
>>>>> > > > > one
>>>>> > > > > > > patch against apache/pulsar now. You must open a new pull
>>>>> request
>>>>> > > to
>>>>> > > > > > > apache/pulsar-site, cross-reference each other and manage
>>>>> the
>>>>> > merge
>>>>> > > > > order
>>>>> > > > > > > (synchronization).
>>>>> > > > > > >
>>>>> > > > > > > Alternatives:
>>>>> > > > > > >
>>>>> > > > > > > To resolve the versioned docs issue, an alternative is to
>>>>> host
>>>>> > only
>>>>> > > > the
>>>>> > > > > > > user docs along with each version, like Flink does[9]. But
>>>>> it
>>>>> > both
>>>>> > > > > detaches
>>>>> > > > > > > from the Docu framework and requires significant
>>>>> development
>>>>> > > efforts.
>>>>> > > > > > >
>>>>> > > > > > > Since it can explicitly change the development flow (that
>>>>> is, you
>>>>> > > > > should
>>>>> > > > > > > now update docs separately), I am starting this discussion
>>>>> here
>>>>> > to
>>>>> > > > > reach
>>>>> > > > > > > for your feedback.
>>>>> > > > > > >
>>>>> > > > > > > Welcome to leave your comments!
>>>>> > > > > > >
>>>>> > > > > > > Best,
>>>>> > > > > > > tison.
>>>>> > > > > > >
>>>>> > > > > > > [1] https://pulsar.apache.org/
>>>>> > > > > > > [2] https://github.com/apache/pulsar-site
>>>>> > > > > > > [3] https://github.com/apache/pulsar/issues/18014
>>>>> > > > > > > [4] https://github.com/apache/pulsar/issues/17599
>>>>> > > > > > > [5]
>>>>> > > >
>>>>> https://github.com/apache/pulsar/pull/17863#discussion_r990174850
>>>>> > > > > > > [6]
>>>>> > > >
>>>>> https://github.com/apache/pulsar/pull/17853#discussion_r991803704
>>>>> > > > > > > [7]
>>>>> > > > > > >
>>>>> > > > >
>>>>> > > >
>>>>> > >
>>>>> >
>>>>> https://github.com/apache/pulsar/blob/b1f9e351fa4d5aba197d33cfc0c536516b55b61f/site2/website/start.sh
>>>>> > > > > > > [8]
>>>>> > > > > > >
>>>>> > > > >
>>>>> > > >
>>>>> > >
>>>>> >
>>>>> https://pulsar.apache.org/contribute/document-preview/#preview-documentation-changes
>>>>> > > > > > > [9] https://github.com/apache/flink/tree/master/docs
>>>>> > > > > >
>>>>> > > > >
>>>>> > > >
>>>>> > >
>>>>> >
>>>>>
>>>>

Reply via email to