Go ahead Utkarsh. It would be nice to work with you along this.

Thanks,
Amogh Desai

On Wed, Oct 25, 2023 at 10:02 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> +1. I think no-one will object to improve the current situation :)
>
> On Wed, Oct 25, 2023 at 5:02 PM utkarsh sharma <utkarshar...@gmail.com>
> wrote:
>
> > Hey everyone,
> >
> > If we have a consensus on the suggestions in my previous email, I would
> > like to subdivide the task into smaller tickets and distribute them among
> > Aritra Basu, Amogh Desai, and myself.
> >
> > Thanks,
> > Utkarsh Sharma
> >
> > On Tue, Oct 24, 2023 at 10:12 PM Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> > > Those look like great ideas.
> > >
> > > On Tue, Oct 24, 2023 at 4:23 PM utkarsh sharma <utkarshar...@gmail.com
> >
> > > wrote:
> > >
> > > > Just forgot to mention in my previous mail, that I'm suggesting the
> > above
> > > > changes since the storage is not the primary concern right now but
> I'm
> > > > happy to contribute either way. :)
> > > >
> > > > On Tue, Oct 24, 2023 at 7:43 PM utkarsh sharma <
> utkarshar...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hey everyone,
> > > > >
> > > > > I have a couple of tasks in mind, that might aid in reducing the
> > > efforts
> > > > > while working with docs. Right now tasks listed below are difficult
> > to
> > > > > achieve.
> > > > >
> > > > > 1. Adding a warning based on a specific provider/version of a
> > > > > provider/range of providers. Which was also the task that Ryan was
> > > > working
> > > > > on.
> > > > > 2. Altering a page layout or CSS for a specific provider.
> > > > >
> > > > > The issue while trying to achieve the above tasks is because of the
> > > > > pre-prepared static files we get as a final product of building
> > > documents
> > > > > with *breeze build-docs* in folder docs/_build. The files we get
> are
> > > > > self-sufficient to be hosted and they are really just used directly
> > > > leaving
> > > > > no room for customization of any sort.
> > > > >
> > > > >
> > > > > My proposal would be to break down this process as follows:
> > > > >
> > > > > 1. We can prepare partial documents as part of *breeze build-docs*
> > > which
> > > > > are only responsible for providing HTML to be populated within the
> > Body
> > > > tag
> > > > > for a specific provider, and not the layout of the entire page.
> > > > > 2. We then copy partial static files to the Airflow-site repo
> within
> > > > > landing pages/site/layouts/docs. Where the layout of the page will
> be
> > > > > provided by `single.html`, a listing of all the providers will be
> > > > provided
> > > > > by `list.html`, which are standard hugo
> > > > > <https://gohugo.io/about/what-is-hugo/> features. Also, using
> static
> > > > > files from `sphinx_airflow_theme` which lives in the same repo,
> makes
> > > the
> > > > > changes on the CSS easy.
> > > > > 3. We can then use Hugo to generate static
> > > > > <https://gohugo.io/getting-started/quick-start/#publish-the-site>
> > > files
> > > > > and push them to the `gh-pages` branch to publish them using GitHub
> > > > pages.
> > > > >
> > > > >
> > > > > Doing the above changes will enable us to do the following:
> > > > >
> > > > > 1. Will give us more control to work on a specific
> > > > > provider/provider-version if we want by providing templates -
> > > > > https://gohugo.io/templates/lookup-order/
> > > > > 2. We will have a specific code to look at depending on the changes
> > one
> > > > > intends to make, right now if you don't know the flow it's a bit
> > > > difficult
> > > > > to pinpoint the code to change.
> > > > > 1. If we want to make changes to a specific provider's content we
> can
> > > do
> > > > > it Airflow's repo docs/<provider>/*.rst file.
> > > > > 2. If we have a change that affects multiple providers or versions
> we
> > > can
> > > > > do it on Airflow Website's repo.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Utkarsh Sharma
> > > > >
> > > > > On Tue, Oct 24, 2023 at 3:45 PM Jarek Potiuk <ja...@potiuk.com>
> > wrote:
> > > > >
> > > > >> So it looks like we have some helping hands and we need someone to
> > > lead
> > > > it
> > > > >> :) (just saying).
> > > > >>
> > > > >> On Tue, Oct 24, 2023 at 8:15 AM Amogh Desai <
> > amoghdesai....@gmail.com
> > > >
> > > > >> wrote:
> > > > >>
> > > > >> > +1 (non binding) from me on the thought of moving the older docs
> > > (~18
> > > > >> > months seems ok) to an archive instead of the repository.
> > > > >> >
> > > > >> > Coming to the other problem of copying the built docs into
> > > > airflow-site
> > > > >> for
> > > > >> > releases, maybe we can fix that using a script? Open for
> thoughts
> > > > here.
> > > > >> >
> > > > >> > I would be very happy to help when we start taking this
> forward, I
> > > > have
> > > > >> > some experience in airflow-site and docs side as well. Feel free
> > to
> > > > >> reach
> > > > >> > out over email or slack :)
> > > > >> >
> > > > >> > Thanks & Regards,
> > > > >> > Amogh Desai
> > > > >> >
> > > > >> > On Mon, Oct 23, 2023 at 3:08 AM Aritra Basu <
> > > aritrabasu1...@gmail.com
> > > > >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > This definitely sounds like something that needs doing sooner
> > > rather
> > > > >> than
> > > > >> > > later.
> > > > >> > >
> > > > >> > > While I'd love to help, I'm not too experienced with this area
> > so
> > > I
> > > > >> might
> > > > >> > > not be able to actually propose what changes need doing, but
> if
> > > > >> someone
> > > > >> > has
> > > > >> > > a path forward on this I can definitely contribute some time
> to
> > > help
> > > > >> out
> > > > >> > > given some guidance on what is needed.
> > > > >> > >
> > > > >> > > --
> > > > >> > > Regards,
> > > > >> > > Aritra Basu
> > > > >> > >
> > > > >> > > On Mon, Oct 23, 2023, 2:19 AM Jarek Potiuk <ja...@potiuk.com>
> > > > wrote:
> > > > >> > >
> > > > >> > > > Some news here.
> > > > >> > > >
> > > > >> > > > I caught up with some infra changes that happened while I
> was
> > > > >> > travelling
> > > > >> > > -
> > > > >> > > > and I have just (with
> > > > >> https://github.com/apache/airflow-site/pull/879)
> > > > >> > > > switched the "airflow-site" building to the new, self-hosted
> > > > >> > > "asf-runners".
> > > > >> > > > This is a new option that ASF infra has given to test for
> the
> > > ASF
> > > > >> > > projects
> > > > >> > > > - rather than relying on "public runners", we can switch to
> > > > >> self-hosted
> > > > >> > > > runners donated by Microsoft to the ASF. More info here:
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?spaceKey=INFRA&title=ASF+Infra+provided+self-hosted+runners
> > > > >> > > >
> > > > >> > > > The most important result is that we now have a lot more
> > > > "breathing
> > > > >> > > space"
> > > > >> > > > for the docs building job. During the build we are using max
> > 59%
> > > > of
> > > > >> the
> > > > >> > > > disk space - with 73GB used and 52GB free.
> > > > >> > > >
> > > > >> > > >  Filesystem      Size  Used Avail Use% Mounted on
> > > > >> > > >   overlay         124G   73G   52G  59% /
> > > > >> > > >
> > > > >> > > > This is - on one hand - good news (disk space is not an
> > "acute"
> > > > >> issue
> > > > >> > any
> > > > >> > > > more), I think if someone would like to work on improving
> the
> > > docs
> > > > >> > > building
> > > > >> > > > of ours, they have much more breathing space to do so.
> > > > >> > > > But - clearly - it might mean that the incentive to work on
> it
> > > > >> > decreased
> > > > >> > > -
> > > > >> > > > because it "just works"). That's the bad effect of it. And I
> > > think
> > > > >> it's
> > > > >> > > not
> > > > >> > > > good, though the most I can do is to reiterate Ryan's
> concerns
> > > and
> > > > >> hope
> > > > >> > > we
> > > > >> > > > will get someone committing to improving this.
> > > > >> > > >
> > > > >> > > > I would strongly encourage those who want to improve it, to
> do
> > > > so. I
> > > > >> > > think
> > > > >> > > > - as Ryan stated - contributing to our docs is more complex
> > than
> > > > it
> > > > >> > > should
> > > > >> > > > be and anyone who would like to contribute there is most
> > > welcome.
> > > > I
> > > > >> > very
> > > > >> > > > much share all the points that Ryan made and I think we
> should
> > > > >> welcome
> > > > >> > > any
> > > > >> > > > efforts to make it better. The lack of
> incremental/auto-build
> > > > >> support
> > > > >> > is
> > > > >> > > > especially troublesome for anyone who wants to contribute
> > their
> > > > >> docs.
> > > > >> > > Happy
> > > > >> > > > to help anyone who would like to take on the task.
> > > > >> > > >
> > > > >> > > > Still - if we would like to move old docs outside as a first
> > > step,
> > > > >> I am
> > > > >> > > > happy to help anyone who would like to commit to doing it.
> > > > >> > > >
> > > > >> > > > J.
> > > > >> > > >
> > > > >> > > > On Fri, Oct 20, 2023 at 3:27 PM Pierre Jeambrun <
> > > > >> pierrejb...@gmail.com
> > > > >> > >
> > > > >> > > > wrote:
> > > > >> > > >
> > > > >> > > > > +1 from moving archived docs outside of airflow-site.
> > > > >> > > > >
> > > > >> > > > > Even if that might mean a little more maintenance in case
> we
> > > > need
> > > > >> to
> > > > >> > > > > propagate changes to all historical versions, we would
> have
> > to
> > > > >> > handle 2
> > > > >> > > > > repositories, but that seems like a minor downside
> compared
> > to
> > > > the
> > > > >> > > > quality
> > > > >> > > > > of life improvement that it would bring for airflow-site
> > > > >> > contributions.
> > > > >> > > > >
> > > > >> > > > > Le jeu. 19 oct. 2023 à 16:11, Jarek Potiuk <
> > ja...@potiuk.com>
> > > a
> > > > >> > écrit
> > > > >> > > :
> > > > >> > > > >
> > > > >> > > > > > Let me just clarify (because that could be unclear) what
> > my
> > > +1
> > > > >> was
> > > > >> > > > about.
> > > > >> > > > > >
> > > > >> > > > > > I was not talking (and I believe Ryan was not talking
> > > either)
> > > > >> about
> > > > >> > > > > > removing the old docs but about archiving them and
> serving
> > > > from
> > > > >> > > > elsewhere
> > > > >> > > > > > (cloud storage).
> > > > >> > > > > >
> > > > >> > > > > > I think discussing changing to more shared HTML/JS/CSS
> is
> > > > also a
> > > > >> > good
> > > > >> > > > > idea
> > > > >> > > > > > to optimise it, but possibly can be handled separately
> as
> > a
> > > > >> longer
> > > > >> > > > effort
> > > > >> > > > > > of redesigning how the docs are built. But by all means
> we
> > > > could
> > > > >> > also
> > > > >> > > > > work
> > > > >> > > > > > on that.
> > > > >> > > > > >
> > > > >> > > > > > Maybe I jumped to conclusions, but the easiest, tactical
> > > > >> solution
> > > > >> > > (for
> > > > >> > > > > the
> > > > >> > > > > > most acute issue - size) is we just move the old
> generated
> > > > HTML
> > > > >> > docs
> > > > >> > > > from
> > > > >> > > > > > the git repository of "airflow-site" and in the
> > > "github_pages"
> > > > >> > branch
> > > > >> > > > we
> > > > >> > > > > > replace it with redirecting of those pages to the files
> > > served
> > > > >> from
> > > > >> > > the
> > > > >> > > > > > cloud storage (and I believe this is what Ryan hinted
> at).
> > > > >> > > > > >
> > > > >> > > > > > Those redirects could be automatically generated for all
> > > > >> > > > > > historical versions and they will be  small. We are
> > already
> > > > >> doing
> > > > >> > it
> > > > >> > > > for
> > > > >> > > > > > individual pages for navigating between versions, but we
> > > could
> > > > >> > easily
> > > > >> > > > > > replace all the historical docs with "<html><head><meta
> > > > >> > > > > > http-equiv="refresh" content="0; url=
> > > > >> > > > > >
> > > > >> https://new-archive-docs-airflow-url/airflow/version/document.url
> "
> > > > >> > > > > > "/></head></html>". Low-tech, surely and "legacy", but
> it
> > > will
> > > > >> > solve
> > > > >> > > > the
> > > > >> > > > > > size problem instantly. We currently have 115.148 such
> > files
> > > > >> which
> > > > >> > > will
> > > > >> > > > > go
> > > > >> > > > > > down to about ~20 MB of files which is peanuts, compared
> > to
> > > > the
> > > > >> > > current
> > > > >> > > > > > 17GB (!) we have.
> > > > >> > > > > >
> > > > >> > > > > > We can also inject into the moved "storage" docs, the
> > header
> > > > >> that
> > > > >> > > > informs
> > > > >> > > > > > that this is an old/archived documentation with single
> > > > redirect
> > > > >> to
> > > > >> > > > > > "live"/"stable" site for newer versions of docs (which I
> > > > believe
> > > > >> > > > sparked
> > > > >> > > > > > Ryan's work). This can be done at least as the "quick"
> > > > >> remediation
> > > > >> > > for
> > > > >> > > > > the
> > > > >> > > > > > size issue and something that might allow the current
> > scheme
> > > > to
> > > > >> > > > > > work without ever-growing repo/size and using space for
> > the
> > > > >> build
> > > > >> > > > action.
> > > > >> > > > > > If we have such an automated mechanism in place, we
> could
> > > > >> > > periodically
> > > > >> > > > > > archive old docs. All that without changing the build
> > > process
> > > > of
> > > > >> > ours
> > > > >> > > > and
> > > > >> > > > > > simply keep old "past" docs elsewhere (still accessible
> > for
> > > > >> users).
> > > > >> > > > > >
> > > > >> > > > > > Not much should change for the users IMHO - if they go
> to
> > > the
> > > > >> old
> > > > >> > > > version
> > > > >> > > > > > of the docs or use old, archived URLs, they would end up
> > > > seeing
> > > > >> the
> > > > >> > > > > > same content/navigation they see today (with extra
> > > information
> > > > >> it's
> > > > >> > > an
> > > > >> > > > > old
> > > > >> > > > > > version and served from a different URL).
> > > > >> > > > > > When they go to the "old" version of documentation they
> > > could
> > > > be
> > > > >> > > > > redirected
> > > > >> > > > > > to the new one - same HTML but hosted on cloud storage,
> > > fully
> > > > >> > > > statically.
> > > > >> > > > > > We already do that with "redirect" mechanism.
> > > > >> > > > > >
> > > > >> > > > > > In the meantime, someone could also work on a strategic
> > > > >> solution -
> > > > >> > > and
> > > > >> > > > > > changing the current build process, but this is - I
> think
> > a
> > > > >> > > different -
> > > > >> > > > > > and much more complex and requiring a lot of effort -
> > step.
> > > > And
> > > > >> it
> > > > >> > > > could
> > > > >> > > > > > simply end up with regenerating whatever is left as
> "live"
> > > > >> > > > documentation
> > > > >> > > > > > (leaving the archive docs intact).
> > > > >> > > > > >
> > > > >> > > > > > That's at least what I see as a possible set of steps to
> > > take.
> > > > >> > > > > >
> > > > >> > > > > > J.
> > > > >> > > > > >
> > > > >> > > > > > On Thu, Oct 19, 2023 at 2:14 PM utkarsh sharma <
> > > > >> > > utkarshar...@gmail.com
> > > > >> > > > >
> > > > >> > > > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > > Hey everyone,
> > > > >> > > > > > >
> > > > >> > > > > > > Thanks, Ryan for stating the thread :)
> > > > >> > > > > > >
> > > > >> > > > > > > Big +1 For archiving docs older than 18 months. We can
> > > still
> > > > >> make
> > > > >> > > the
> > > > >> > > > > > older
> > > > >> > > > > > > docs available in `rst` doc form.
> > > > >> > > > > > >
> > > > >> > > > > > > But eventually, we might again run into this problem
> > > because
> > > > >> of
> > > > >> > the
> > > > >> > > > > > growing
> > > > >> > > > > > > no. of providers. I think the main reason for this
> issue
> > > is
> > > > >> the
> > > > >> > > > > generated
> > > > >> > > > > > > static HTML pages and the way we cater to them using
> > > GitHub
> > > > >> > Pages.
> > > > >> > > > The
> > > > >> > > > > > > generated pages have lots of common code
> > > > >> > > > > > > HTML(headers/navigation/breadcrumbs/footer etc.) CSS,
> JS
> > > > >> which is
> > > > >> > > > > > repeated
> > > > >> > > > > > > for every provider and every version of that provider.
> > If
> > > we
> > > > >> > have a
> > > > >> > > > > more
> > > > >> > > > > > > dynamic way(Django/Flask Servers) of catering the
> > > documents
> > > > we
> > > > >> > can
> > > > >> > > > save
> > > > >> > > > > > all
> > > > >> > > > > > > the space for common HTML/CSS/JS.
> > > > >> > > > > > >
> > > > >> > > > > > > But the downsides of this approach are:
> > > > >> > > > > > > 1. We need to have a server
> > > > >> > > > > > > 2. Also require changes in the existing document build
> > > > >> process to
> > > > >> > > > only
> > > > >> > > > > > > produce partial HTML documents.
> > > > >> > > > > > >
> > > > >> > > > > > > Thanks,
> > > > >> > > > > > > Utkarsh Sharma
> > > > >> > > > > > >
> > > > >> > > > > > > On Thu, Oct 19, 2023 at 4:08 PM Jarek Potiuk <
> > > > >> ja...@potiuk.com>
> > > > >> > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > Yes. Moving the old version to somewhere that we can
> > > > >> > keep/archive
> > > > >> > > > > > static
> > > > >> > > > > > > > historical versions of those historical docs and
> > publish
> > > > >> them
> > > > >> > > from
> > > > >> > > > > > there.
> > > > >> > > > > > > > What you proposed is exactly the solution I thought
> > > might
> > > > be
> > > > >> > best
> > > > >> > > > as
> > > > >> > > > > > > well.
> > > > >> > > > > > > >
> > > > >> > > > > > > > It would be a great task to contribute to the
> > stability
> > > of
> > > > >> our
> > > > >> > > docs
> > > > >> > > > > > > > generation in the future.
> > > > >> > > > > > > >
> > > > >> > > > > > > > I don't think it's a matter of discussing in detail
> > how
> > > to
> > > > >> do
> > > > >> > it
> > > > >> > > > (18
> > > > >> > > > > > > months
> > > > >> > > > > > > > is a good start and you can parameterize it), It's
> the
> > > > >> matter
> > > > >> > of
> > > > >> > > > > > > > someone committing to it and doing it simply :).
> > > > >> > > > > > > >
> > > > >> > > > > > > > So yes I personally am all for it and if I
> understand
> > > > >> correctly
> > > > >> > > > that
> > > > >> > > > > > you
> > > > >> > > > > > > > are looking for agreement on doing it, big +1 from
> my
> > > > side -
> > > > >> > > happy
> > > > >> > > > to
> > > > >> > > > > > > help
> > > > >> > > > > > > > with providing access to our S3 buckets.
> > > > >> > > > > > > >
> > > > >> > > > > > > > J.
> > > > >> > > > > > > >
> > > > >> > > > > > > > On Thu, Oct 19, 2023 at 5:39 AM Ryan Hatter
> > > > >> > > > > > > > <ryan.hat...@astronomer.io.invalid> wrote:
> > > > >> > > > > > > >
> > > > >> > > > > > > > > *tl;dr*
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >    1. The GitHub Action for building docs is
> running
> > > out
> > > > >> of
> > > > >> > > > space.
> > > > >> > > > > I
> > > > >> > > > > > > > think
> > > > >> > > > > > > > >    we should archive really old documentation for
> > > large
> > > > >> > > packages
> > > > >> > > > to
> > > > >> > > > > > > cloud
> > > > >> > > > > > > > >    storage.
> > > > >> > > > > > > > >    2. Contributing to and building Airflow docs is
> > > hard.
> > > > >> We
> > > > >> > > > should
> > > > >> > > > > > > > migrate
> > > > >> > > > > > > > >    to a framework, preferably one that uses
> markdown
> > > > >> > (although
> > > > >> > > I
> > > > >> > > > > > > > > acknowledge
> > > > >> > > > > > > > >    rst -> md will be a massive overhaul).
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > *Problem Summary*
> > > > >> > > > > > > > > I recently set out to implement what I thought
> would
> > > be
> > > > a
> > > > >> > > > > > > straightforward
> > > > >> > > > > > > > > feature: warn users when they are viewing
> > > documentation
> > > > >> for
> > > > >> > > > > > non-current
> > > > >> > > > > > > > > versions of Airflow and link them to the
> > > current/stable
> > > > >> > version
> > > > >> > > > > > > > > <https://github.com/apache/airflow/pull/34639>.
> Jed
> > > > >> pointed
> > > > >> > me
> > > > >> > > > to
> > > > >> > > > > > the
> > > > >> > > > > > > > > airflow-site <
> > https://github.com/apache/airflow-site>
> > > > >> repo,
> > > > >> > > > which
> > > > >> > > > > > > > contains
> > > > >> > > > > > > > > all of the archived docs (that is, documentation
> for
> > > > >> > > non-current
> > > > >> > > > > > > > versions),
> > > > >> > > > > > > > > and from there, I ran into a brick wall.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > I want to raise some concerns that I've developed
> > > after
> > > > >> > trying
> > > > >> > > to
> > > > >> > > > > > > > > contribute what feel like a couple reasonably
> small
> > > docs
> > > > >> > > updates:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >    1. airflow-site
> > > > >> > > > > > > > >       1. Elad pointed out the problem posed by the
> > > sheer
> > > > >> size
> > > > >> > > of
> > > > >> > > > > > > archived
> > > > >> > > > > > > > >       docs
> > > > >> > > > > > > > >       <
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://apache-airflow.slack.com/archives/CCPRP7943/p1697009000242369?thread_ts=1696973512.004229&cid=CCPRP7943
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > (more
> > > > >> > > > > > > > >       on this later).
> > > > >> > > > > > > > >       2. The airflow-site repo is confusing, and
> > > rather
> > > > >> > poorly
> > > > >> > > > > > > > documented.
> > > > >> > > > > > > > >          1. Hugo (static site generator) exists,
> but
> > > > >> appears
> > > > >> > to
> > > > >> > > > > only
> > > > >> > > > > > be
> > > > >> > > > > > > > >          used for the landing pages
> > > > >> > > > > > > > >          2. In order to view any documentation
> > locally
> > > > >> other
> > > > >> > > than
> > > > >> > > > > the
> > > > >> > > > > > > > >          landing pages, you'll need to run the
> > site.sh
> > > > >> script
> > > > >> > > > then
> > > > >> > > > > > > > > copy the output
> > > > >> > > > > > > > >          from one dir to another?
> > > > >> > > > > > > > >       3. All of the archived docs are raw HTML,
> > making
> > > > >> > > migrating
> > > > >> > > > > to a
> > > > >> > > > > > > > >       static site generator a significant
> challenge,
> > > > which
> > > > >> > > makes
> > > > >> > > > it
> > > > >> > > > > > > > > difficult to
> > > > >> > > > > > > > >       prevent the archived docs from continuing to
> > > grow
> > > > >> and
> > > > >> > > grow.
> > > > >> > > > > > > > > Perhaps this is the
> > > > >> > > > > > > > >       wheel Khaleesi was referring to
> > > > >> > > > > > > > >       <
> https://www.youtube.com/watch?v=J-rxmk6zPxA
> > >?
> > > > >> > > > > > > > >    2. airflow
> > > > >> > > > > > > > >       1. Building Airflow docs is a challenge. It
> > > takes
> > > > >> > several
> > > > >> > > > > > minutes
> > > > >> > > > > > > > and
> > > > >> > > > > > > > >       doesn't support auto-build, so the slightest
> > > issue
> > > > >> > could
> > > > >> > > > > > require
> > > > >> > > > > > > > > waiting
> > > > >> > > > > > > > >       again and again until the changes are just
> > so. I
> > > > >> tried
> > > > >> > > > > > > implementing
> > > > >> > > > > > > > >       sphinx-autobuild <
> > > > >> > > > > > > > >
> https://github.com/executablebooks/sphinx-autobuild
> > >
> > > > >> > > > > > > > >       to no avail.
> > > > >> > > > > > > > >       2. Sphinx/restructured text has a steep
> > learning
> > > > >> curve.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > *The most acute issue: disk space*
> > > > >> > > > > > > > > The size of the archived docs is causing the docs
> > > build
> > > > >> > GitHub
> > > > >> > > > > Action
> > > > >> > > > > > > to
> > > > >> > > > > > > > > almost run out of space. From the "Build site"
> > Action
> > > > >> from a
> > > > >> > > > couple
> > > > >> > > > > > > weeks
> > > > >> > > > > > > > > ago
> > > > >> > > > > > > > > <
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/airflow-site/actions/runs/6419529645/job/17432628458
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > (expand
> > > > >> > > > > > > > > the build site step, scroll all the way to the
> > bottom,
> > > > >> expand
> > > > >> > > the
> > > > >> > > > > `df
> > > > >> > > > > > > -h`
> > > > >> > > > > > > > > command), we can see the GitHub Action runner (or
> > > > whatever
> > > > >> > it's
> > > > >> > > > > > called)
> > > > >> > > > > > > > is
> > > > >> > > > > > > > > nearly running out of space:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > df -h
> > > > >> > > > > > > > >   *Filesystem      Size  Used Avail Use% Mounted
> on*
> > > > >> > > > > > > > >   /dev/root        84G   82G  2.1G  98% /
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > The available space is down to 1.8G on the most
> > recent
> > > > >> Action
> > > > >> > > > > > > > > <
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/airflow-site/actions/runs/6564727255/job/17831714176
> > > > >> > > > > > > > > >.
> > > > >> > > > > > > > > If we assume that trend is accurate, we have about
> > two
> > > > >> months
> > > > >> > > > > before
> > > > >> > > > > > > the
> > > > >> > > > > > > > > Action runner runs out of disk space. Here's a
> > > breakdown
> > > > >> of
> > > > >> > the
> > > > >> > > > > space
> > > > >> > > > > > > > > consumed by the 10 largest package documentation
> > > > >> directories:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > du -h -d 1 docs-archive/ | sort -h -r
> > > > >> > > > > > > > > * 14G* docs-archive/
> > > > >> > > > > > > > > *4.0G*
> docs-archive//apache-airflow-providers-google
> > > > >> > > > > > > > > *3.2G* docs-archive//apache-airflow
> > > > >> > > > > > > > > *1.7G*
> docs-archive//apache-airflow-providers-amazon
> > > > >> > > > > > > > > *560M*
> > > > >> docs-archive//apache-airflow-providers-microsoft-azure
> > > > >> > > > > > > > > *254M*
> > > > >> docs-archive//apache-airflow-providers-cncf-kubernetes
> > > > >> > > > > > > > > *192M*
> > > > docs-archive//apache-airflow-providers-apache-hive
> > > > >> > > > > > > > > *153M*
> > > docs-archive//apache-airflow-providers-snowflake
> > > > >> > > > > > > > > *139M*
> > > docs-archive//apache-airflow-providers-databricks
> > > > >> > > > > > > > > *104M*
> docs-archive//apache-airflow-providers-docker
> > > > >> > > > > > > > > *101M*
> docs-archive//apache-airflow-providers-mysql
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > *Proposed solution: Archive old docs html for
> large
> > > > >> packages
> > > > >> > to
> > > > >> > > > > cloud
> > > > >> > > > > > > > > storage*
> > > > >> > > > > > > > > I'm wondering if it would be reasonable to truly
> > > archive
> > > > >> the
> > > > >> > > docs
> > > > >> > > > > for
> > > > >> > > > > > > > some
> > > > >> > > > > > > > > of the older versions of these packages. Perhaps
> the
> > > > last
> > > > >> 18
> > > > >> > > > > months?
> > > > >> > > > > > > > Maybe
> > > > >> > > > > > > > > we could drop the html in a blob storage bucket
> with
> > > > >> > > instructions
> > > > >> > > > > for
> > > > >> > > > > > > > > building the docs if absolutely necessary?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > *Improving docs building moving forward*
> > > > >> > > > > > > > > There's an open Issue <
> > > > >> > > > > > > https://github.com/apache/airflow-site/issues/719
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > for
> > > > >> > > > > > > > > migrating the docs to a framework, but it's not at
> > > all a
> > > > >> > > > > > > straightforward
> > > > >> > > > > > > > > task for the archived docs. I think that we should
> > > > >> institute
> > > > >> > a
> > > > >> > > > > policy
> > > > >> > > > > > > of
> > > > >> > > > > > > > > archiving old documentation to cloud storage
> after X
> > > > time
> > > > >> and
> > > > >> > > > use a
> > > > >> > > > > > > > > framework for building docs in a scalable and
> > > > sustainable
> > > > >> way
> > > > >> > > > > moving
> > > > >> > > > > > > > > forward. Maybe we could chat with iceberg folks
> > about
> > > > how
> > > > >> > they
> > > > >> > > > > moved
> > > > >> > > > > > > from
> > > > >> > > > > > > > > mkdocs to hugo? <
> > > > >> > https://github.com/apache/iceberg/issues/3616
> > > > >> > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Shoutout to Utkarsh for helping me through all
> this!
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Reply via email to