I'm happy to work on this alongside Utkarsh, Amogh Desai, and Aritra Basu
:)
Some thoughts on Utkarsh's proposal (and what him and I have been
discussing offline):

   1. I think we should start with enabling Hugo in the documentation build
   process for new releases
      1. This may need to include a way to serve html from S3, as I think
      we'll need to build each version for each package (apache-airflow &
      providers). If we do this each time, the amount of docs built will grow
      exponentially and we might find ourselves again in a similar situation
      2. Once this is done, all new docs will be buildable without storing
      the raw html locally
      3. I think a good example (at least for a lot of this process) is how
      the Apache Iceberg docs repo
      <https://github.com/apache/iceberg-docs/tree/main> is built.
   2. Once that's complete, we can implement a process to archive the raw
   .rst files for docs older than 18 months to S3 along with a way to download
   and build those in the airflow-site repo.
      1. This will result in temporarily having two separate builds:
         1. One for archived docs like we do now
         2. And the build process for new docs developed in (1)
      2. After 18 months, all of the archived docs will be out of the repo,
      and we can move forward with only the build process developed in (1)


On Fri, Oct 27, 2023 at 7:55 AM utkarsh sharma <utkarshar...@gmail.com>
wrote:

> That sounds good, I'll start with creating smaller tickets for the above
> task, which I intend to do by the end of this week.
>
> Thanks,
> Utkarsh Sharma
>
>
> On Thu, Oct 26, 2023 at 4:16 PM Aritra Basu <aritrabasu1...@gmail.com>
> wrote:
>
> > Yup, sounds good to me let's go for it!
> >
> > --
> > Regards,
> > Aritra Basu
> >
> > On Thu, Oct 26, 2023, 1:47 PM Amogh Desai <amoghdesai....@gmail.com>
> > wrote:
> >
> > > Go ahead Utkarsh. It would be nice to work with you along this.
> > >
> > > Thanks,
> > > Amogh Desai
> > >
> > > On Wed, Oct 25, 2023 at 10:02 PM Jarek Potiuk <ja...@potiuk.com>
> wrote:
> > >
> > > > +1. I think no-one will object to improve the current situation :)
> > > >
> > > > On Wed, Oct 25, 2023 at 5:02 PM utkarsh sharma <
> utkarshar...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hey everyone,
> > > > >
> > > > > If we have a consensus on the suggestions in my previous email, I
> > would
> > > > > like to subdivide the task into smaller tickets and distribute them
> > > among
> > > > > Aritra Basu, Amogh Desai, and myself.
> > > > >
> > > > > Thanks,
> > > > > Utkarsh Sharma
> > > > >
> > > > > On Tue, Oct 24, 2023 at 10:12 PM Jarek Potiuk <ja...@potiuk.com>
> > > wrote:
> > > > >
> > > > > > Those look like great ideas.
> > > > > >
> > > > > > On Tue, Oct 24, 2023 at 4:23 PM utkarsh sharma <
> > > utkarshar...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Just forgot to mention in my previous mail, that I'm suggesting
> > the
> > > > > above
> > > > > > > changes since the storage is not the primary concern right now
> > but
> > > > I'm
> > > > > > > happy to contribute either way. :)
> > > > > > >
> > > > > > > On Tue, Oct 24, 2023 at 7:43 PM utkarsh sharma <
> > > > utkarshar...@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey everyone,
> > > > > > > >
> > > > > > > > I have a couple of tasks in mind, that might aid in reducing
> > the
> > > > > > efforts
> > > > > > > > while working with docs. Right now tasks listed below are
> > > difficult
> > > > > to
> > > > > > > > achieve.
> > > > > > > >
> > > > > > > > 1. Adding a warning based on a specific provider/version of a
> > > > > > > > provider/range of providers. Which was also the task that
> Ryan
> > > was
> > > > > > > working
> > > > > > > > on.
> > > > > > > > 2. Altering a page layout or CSS for a specific provider.
> > > > > > > >
> > > > > > > > The issue while trying to achieve the above tasks is because
> of
> > > the
> > > > > > > > pre-prepared static files we get as a final product of
> building
> > > > > > documents
> > > > > > > > with *breeze build-docs* in folder docs/_build. The files we
> > get
> > > > are
> > > > > > > > self-sufficient to be hosted and they are really just used
> > > directly
> > > > > > > leaving
> > > > > > > > no room for customization of any sort.
> > > > > > > >
> > > > > > > >
> > > > > > > > My proposal would be to break down this process as follows:
> > > > > > > >
> > > > > > > > 1. We can prepare partial documents as part of *breeze
> > > build-docs*
> > > > > > which
> > > > > > > > are only responsible for providing HTML to be populated
> within
> > > the
> > > > > Body
> > > > > > > tag
> > > > > > > > for a specific provider, and not the layout of the entire
> page.
> > > > > > > > 2. We then copy partial static files to the Airflow-site repo
> > > > within
> > > > > > > > landing pages/site/layouts/docs. Where the layout of the page
> > > will
> > > > be
> > > > > > > > provided by `single.html`, a listing of all the providers
> will
> > be
> > > > > > > provided
> > > > > > > > by `list.html`, which are standard hugo
> > > > > > > > <https://gohugo.io/about/what-is-hugo/> features. Also,
> using
> > > > static
> > > > > > > > files from `sphinx_airflow_theme` which lives in the same
> repo,
> > > > makes
> > > > > > the
> > > > > > > > changes on the CSS easy.
> > > > > > > > 3. We can then use Hugo to generate static
> > > > > > > > <
> > https://gohugo.io/getting-started/quick-start/#publish-the-site
> > > >
> > > > > > files
> > > > > > > > and push them to the `gh-pages` branch to publish them using
> > > GitHub
> > > > > > > pages.
> > > > > > > >
> > > > > > > >
> > > > > > > > Doing the above changes will enable us to do the following:
> > > > > > > >
> > > > > > > > 1. Will give us more control to work on a specific
> > > > > > > > provider/provider-version if we want by providing templates -
> > > > > > > > https://gohugo.io/templates/lookup-order/
> > > > > > > > 2. We will have a specific code to look at depending on the
> > > changes
> > > > > one
> > > > > > > > intends to make, right now if you don't know the flow it's a
> > bit
> > > > > > > difficult
> > > > > > > > to pinpoint the code to change.
> > > > > > > > 1. If we want to make changes to a specific provider's
> content
> > we
> > > > can
> > > > > > do
> > > > > > > > it Airflow's repo docs/<provider>/*.rst file.
> > > > > > > > 2. If we have a change that affects multiple providers or
> > > versions
> > > > we
> > > > > > can
> > > > > > > > do it on Airflow Website's repo.
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Utkarsh Sharma
> > > > > > > >
> > > > > > > > On Tue, Oct 24, 2023 at 3:45 PM Jarek Potiuk <
> ja...@potiuk.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > >> So it looks like we have some helping hands and we need
> > someone
> > > to
> > > > > > lead
> > > > > > > it
> > > > > > > >> :) (just saying).
> > > > > > > >>
> > > > > > > >> On Tue, Oct 24, 2023 at 8:15 AM Amogh Desai <
> > > > > amoghdesai....@gmail.com
> > > > > > >
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > +1 (non binding) from me on the thought of moving the
> older
> > > docs
> > > > > > (~18
> > > > > > > >> > months seems ok) to an archive instead of the repository.
> > > > > > > >> >
> > > > > > > >> > Coming to the other problem of copying the built docs into
> > > > > > > airflow-site
> > > > > > > >> for
> > > > > > > >> > releases, maybe we can fix that using a script? Open for
> > > > thoughts
> > > > > > > here.
> > > > > > > >> >
> > > > > > > >> > I would be very happy to help when we start taking this
> > > > forward, I
> > > > > > > have
> > > > > > > >> > some experience in airflow-site and docs side as well.
> Feel
> > > free
> > > > > to
> > > > > > > >> reach
> > > > > > > >> > out over email or slack :)
> > > > > > > >> >
> > > > > > > >> > Thanks & Regards,
> > > > > > > >> > Amogh Desai
> > > > > > > >> >
> > > > > > > >> > On Mon, Oct 23, 2023 at 3:08 AM Aritra Basu <
> > > > > > aritrabasu1...@gmail.com
> > > > > > > >
> > > > > > > >> > wrote:
> > > > > > > >> >
> > > > > > > >> > > This definitely sounds like something that needs doing
> > > sooner
> > > > > > rather
> > > > > > > >> than
> > > > > > > >> > > later.
> > > > > > > >> > >
> > > > > > > >> > > While I'd love to help, I'm not too experienced with
> this
> > > area
> > > > > so
> > > > > > I
> > > > > > > >> might
> > > > > > > >> > > not be able to actually propose what changes need doing,
> > but
> > > > if
> > > > > > > >> someone
> > > > > > > >> > has
> > > > > > > >> > > a path forward on this I can definitely contribute some
> > time
> > > > to
> > > > > > help
> > > > > > > >> out
> > > > > > > >> > > given some guidance on what is needed.
> > > > > > > >> > >
> > > > > > > >> > > --
> > > > > > > >> > > Regards,
> > > > > > > >> > > Aritra Basu
> > > > > > > >> > >
> > > > > > > >> > > On Mon, Oct 23, 2023, 2:19 AM Jarek Potiuk <
> > > ja...@potiuk.com>
> > > > > > > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Some news here.
> > > > > > > >> > > >
> > > > > > > >> > > > I caught up with some infra changes that happened
> while
> > I
> > > > was
> > > > > > > >> > travelling
> > > > > > > >> > > -
> > > > > > > >> > > > and I have just (with
> > > > > > > >> https://github.com/apache/airflow-site/pull/879)
> > > > > > > >> > > > switched the "airflow-site" building to the new,
> > > self-hosted
> > > > > > > >> > > "asf-runners".
> > > > > > > >> > > > This is a new option that ASF infra has given to test
> > for
> > > > the
> > > > > > ASF
> > > > > > > >> > > projects
> > > > > > > >> > > > - rather than relying on "public runners", we can
> switch
> > > to
> > > > > > > >> self-hosted
> > > > > > > >> > > > runners donated by Microsoft to the ASF. More info
> here:
> > > > > > > >> > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?spaceKey=INFRA&title=ASF+Infra+provided+self-hosted+runners
> > > > > > > >> > > >
> > > > > > > >> > > > The most important result is that we now have a lot
> more
> > > > > > > "breathing
> > > > > > > >> > > space"
> > > > > > > >> > > > for the docs building job. During the build we are
> using
> > > max
> > > > > 59%
> > > > > > > of
> > > > > > > >> the
> > > > > > > >> > > > disk space - with 73GB used and 52GB free.
> > > > > > > >> > > >
> > > > > > > >> > > >  Filesystem      Size  Used Avail Use% Mounted on
> > > > > > > >> > > >   overlay         124G   73G   52G  59% /
> > > > > > > >> > > >
> > > > > > > >> > > > This is - on one hand - good news (disk space is not
> an
> > > > > "acute"
> > > > > > > >> issue
> > > > > > > >> > any
> > > > > > > >> > > > more), I think if someone would like to work on
> > improving
> > > > the
> > > > > > docs
> > > > > > > >> > > building
> > > > > > > >> > > > of ours, they have much more breathing space to do so.
> > > > > > > >> > > > But - clearly - it might mean that the incentive to
> work
> > > on
> > > > it
> > > > > > > >> > decreased
> > > > > > > >> > > -
> > > > > > > >> > > > because it "just works"). That's the bad effect of it.
> > > And I
> > > > > > think
> > > > > > > >> it's
> > > > > > > >> > > not
> > > > > > > >> > > > good, though the most I can do is to reiterate Ryan's
> > > > concerns
> > > > > > and
> > > > > > > >> hope
> > > > > > > >> > > we
> > > > > > > >> > > > will get someone committing to improving this.
> > > > > > > >> > > >
> > > > > > > >> > > > I would strongly encourage those who want to improve
> it,
> > > to
> > > > do
> > > > > > > so. I
> > > > > > > >> > > think
> > > > > > > >> > > > - as Ryan stated - contributing to our docs is more
> > > complex
> > > > > than
> > > > > > > it
> > > > > > > >> > > should
> > > > > > > >> > > > be and anyone who would like to contribute there is
> most
> > > > > > welcome.
> > > > > > > I
> > > > > > > >> > very
> > > > > > > >> > > > much share all the points that Ryan made and I think
> we
> > > > should
> > > > > > > >> welcome
> > > > > > > >> > > any
> > > > > > > >> > > > efforts to make it better. The lack of
> > > > incremental/auto-build
> > > > > > > >> support
> > > > > > > >> > is
> > > > > > > >> > > > especially troublesome for anyone who wants to
> > contribute
> > > > > their
> > > > > > > >> docs.
> > > > > > > >> > > Happy
> > > > > > > >> > > > to help anyone who would like to take on the task.
> > > > > > > >> > > >
> > > > > > > >> > > > Still - if we would like to move old docs outside as a
> > > first
> > > > > > step,
> > > > > > > >> I am
> > > > > > > >> > > > happy to help anyone who would like to commit to doing
> > it.
> > > > > > > >> > > >
> > > > > > > >> > > > J.
> > > > > > > >> > > >
> > > > > > > >> > > > On Fri, Oct 20, 2023 at 3:27 PM Pierre Jeambrun <
> > > > > > > >> pierrejb...@gmail.com
> > > > > > > >> > >
> > > > > > > >> > > > wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > > +1 from moving archived docs outside of
> airflow-site.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Even if that might mean a little more maintenance in
> > > case
> > > > we
> > > > > > > need
> > > > > > > >> to
> > > > > > > >> > > > > propagate changes to all historical versions, we
> would
> > > > have
> > > > > to
> > > > > > > >> > handle 2
> > > > > > > >> > > > > repositories, but that seems like a minor downside
> > > > compared
> > > > > to
> > > > > > > the
> > > > > > > >> > > > quality
> > > > > > > >> > > > > of life improvement that it would bring for
> > airflow-site
> > > > > > > >> > contributions.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Le jeu. 19 oct. 2023 à 16:11, Jarek Potiuk <
> > > > > ja...@potiuk.com>
> > > > > > a
> > > > > > > >> > écrit
> > > > > > > >> > > :
> > > > > > > >> > > > >
> > > > > > > >> > > > > > Let me just clarify (because that could be
> unclear)
> > > what
> > > > > my
> > > > > > +1
> > > > > > > >> was
> > > > > > > >> > > > about.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > I was not talking (and I believe Ryan was not
> > talking
> > > > > > either)
> > > > > > > >> about
> > > > > > > >> > > > > > removing the old docs but about archiving them and
> > > > serving
> > > > > > > from
> > > > > > > >> > > > elsewhere
> > > > > > > >> > > > > > (cloud storage).
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > I think discussing changing to more shared
> > HTML/JS/CSS
> > > > is
> > > > > > > also a
> > > > > > > >> > good
> > > > > > > >> > > > > idea
> > > > > > > >> > > > > > to optimise it, but possibly can be handled
> > separately
> > > > as
> > > > > a
> > > > > > > >> longer
> > > > > > > >> > > > effort
> > > > > > > >> > > > > > of redesigning how the docs are built. But by all
> > > means
> > > > we
> > > > > > > could
> > > > > > > >> > also
> > > > > > > >> > > > > work
> > > > > > > >> > > > > > on that.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Maybe I jumped to conclusions, but the easiest,
> > > tactical
> > > > > > > >> solution
> > > > > > > >> > > (for
> > > > > > > >> > > > > the
> > > > > > > >> > > > > > most acute issue - size) is we just move the old
> > > > generated
> > > > > > > HTML
> > > > > > > >> > docs
> > > > > > > >> > > > from
> > > > > > > >> > > > > > the git repository of "airflow-site" and in the
> > > > > > "github_pages"
> > > > > > > >> > branch
> > > > > > > >> > > > we
> > > > > > > >> > > > > > replace it with redirecting of those pages to the
> > > files
> > > > > > served
> > > > > > > >> from
> > > > > > > >> > > the
> > > > > > > >> > > > > > cloud storage (and I believe this is what Ryan
> > hinted
> > > > at).
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Those redirects could be automatically generated
> for
> > > all
> > > > > > > >> > > > > > historical versions and they will be  small. We
> are
> > > > > already
> > > > > > > >> doing
> > > > > > > >> > it
> > > > > > > >> > > > for
> > > > > > > >> > > > > > individual pages for navigating between versions,
> > but
> > > we
> > > > > > could
> > > > > > > >> > easily
> > > > > > > >> > > > > > replace all the historical docs with
> > > "<html><head><meta
> > > > > > > >> > > > > > http-equiv="refresh" content="0; url=
> > > > > > > >> > > > > >
> > > > > > > >>
> > > https://new-archive-docs-airflow-url/airflow/version/document.url
> > > > "
> > > > > > > >> > > > > > "/></head></html>". Low-tech, surely and "legacy",
> > but
> > > > it
> > > > > > will
> > > > > > > >> > solve
> > > > > > > >> > > > the
> > > > > > > >> > > > > > size problem instantly. We currently have 115.148
> > such
> > > > > files
> > > > > > > >> which
> > > > > > > >> > > will
> > > > > > > >> > > > > go
> > > > > > > >> > > > > > down to about ~20 MB of files which is peanuts,
> > > compared
> > > > > to
> > > > > > > the
> > > > > > > >> > > current
> > > > > > > >> > > > > > 17GB (!) we have.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > We can also inject into the moved "storage" docs,
> > the
> > > > > header
> > > > > > > >> that
> > > > > > > >> > > > informs
> > > > > > > >> > > > > > that this is an old/archived documentation with
> > single
> > > > > > > redirect
> > > > > > > >> to
> > > > > > > >> > > > > > "live"/"stable" site for newer versions of docs
> > > (which I
> > > > > > > believe
> > > > > > > >> > > > sparked
> > > > > > > >> > > > > > Ryan's work). This can be done at least as the
> > "quick"
> > > > > > > >> remediation
> > > > > > > >> > > for
> > > > > > > >> > > > > the
> > > > > > > >> > > > > > size issue and something that might allow the
> > current
> > > > > scheme
> > > > > > > to
> > > > > > > >> > > > > > work without ever-growing repo/size and using
> space
> > > for
> > > > > the
> > > > > > > >> build
> > > > > > > >> > > > action.
> > > > > > > >> > > > > > If we have such an automated mechanism in place,
> we
> > > > could
> > > > > > > >> > > periodically
> > > > > > > >> > > > > > archive old docs. All that without changing the
> > build
> > > > > > process
> > > > > > > of
> > > > > > > >> > ours
> > > > > > > >> > > > and
> > > > > > > >> > > > > > simply keep old "past" docs elsewhere (still
> > > accessible
> > > > > for
> > > > > > > >> users).
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Not much should change for the users IMHO - if
> they
> > go
> > > > to
> > > > > > the
> > > > > > > >> old
> > > > > > > >> > > > version
> > > > > > > >> > > > > > of the docs or use old, archived URLs, they would
> > end
> > > up
> > > > > > > seeing
> > > > > > > >> the
> > > > > > > >> > > > > > same content/navigation they see today (with extra
> > > > > > information
> > > > > > > >> it's
> > > > > > > >> > > an
> > > > > > > >> > > > > old
> > > > > > > >> > > > > > version and served from a different URL).
> > > > > > > >> > > > > > When they go to the "old" version of documentation
> > > they
> > > > > > could
> > > > > > > be
> > > > > > > >> > > > > redirected
> > > > > > > >> > > > > > to the new one - same HTML but hosted on cloud
> > > storage,
> > > > > > fully
> > > > > > > >> > > > statically.
> > > > > > > >> > > > > > We already do that with "redirect" mechanism.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > In the meantime, someone could also work on a
> > > strategic
> > > > > > > >> solution -
> > > > > > > >> > > and
> > > > > > > >> > > > > > changing the current build process, but this is -
> I
> > > > think
> > > > > a
> > > > > > > >> > > different -
> > > > > > > >> > > > > > and much more complex and requiring a lot of
> effort
> > -
> > > > > step.
> > > > > > > And
> > > > > > > >> it
> > > > > > > >> > > > could
> > > > > > > >> > > > > > simply end up with regenerating whatever is left
> as
> > > > "live"
> > > > > > > >> > > > documentation
> > > > > > > >> > > > > > (leaving the archive docs intact).
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > That's at least what I see as a possible set of
> > steps
> > > to
> > > > > > take.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > J.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > On Thu, Oct 19, 2023 at 2:14 PM utkarsh sharma <
> > > > > > > >> > > utkarshar...@gmail.com
> > > > > > > >> > > > >
> > > > > > > >> > > > > > wrote:
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > > Hey everyone,
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > Thanks, Ryan for stating the thread :)
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > Big +1 For archiving docs older than 18 months.
> We
> > > can
> > > > > > still
> > > > > > > >> make
> > > > > > > >> > > the
> > > > > > > >> > > > > > older
> > > > > > > >> > > > > > > docs available in `rst` doc form.
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > But eventually, we might again run into this
> > problem
> > > > > > because
> > > > > > > >> of
> > > > > > > >> > the
> > > > > > > >> > > > > > growing
> > > > > > > >> > > > > > > no. of providers. I think the main reason for
> this
> > > > issue
> > > > > > is
> > > > > > > >> the
> > > > > > > >> > > > > generated
> > > > > > > >> > > > > > > static HTML pages and the way we cater to them
> > using
> > > > > > GitHub
> > > > > > > >> > Pages.
> > > > > > > >> > > > The
> > > > > > > >> > > > > > > generated pages have lots of common code
> > > > > > > >> > > > > > > HTML(headers/navigation/breadcrumbs/footer etc.)
> > > CSS,
> > > > JS
> > > > > > > >> which is
> > > > > > > >> > > > > > repeated
> > > > > > > >> > > > > > > for every provider and every version of that
> > > provider.
> > > > > If
> > > > > > we
> > > > > > > >> > have a
> > > > > > > >> > > > > more
> > > > > > > >> > > > > > > dynamic way(Django/Flask Servers) of catering
> the
> > > > > > documents
> > > > > > > we
> > > > > > > >> > can
> > > > > > > >> > > > save
> > > > > > > >> > > > > > all
> > > > > > > >> > > > > > > the space for common HTML/CSS/JS.
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > But the downsides of this approach are:
> > > > > > > >> > > > > > > 1. We need to have a server
> > > > > > > >> > > > > > > 2. Also require changes in the existing document
> > > build
> > > > > > > >> process to
> > > > > > > >> > > > only
> > > > > > > >> > > > > > > produce partial HTML documents.
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > Thanks,
> > > > > > > >> > > > > > > Utkarsh Sharma
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > On Thu, Oct 19, 2023 at 4:08 PM Jarek Potiuk <
> > > > > > > >> ja...@potiuk.com>
> > > > > > > >> > > > wrote:
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > > Yes. Moving the old version to somewhere that
> we
> > > can
> > > > > > > >> > keep/archive
> > > > > > > >> > > > > > static
> > > > > > > >> > > > > > > > historical versions of those historical docs
> and
> > > > > publish
> > > > > > > >> them
> > > > > > > >> > > from
> > > > > > > >> > > > > > there.
> > > > > > > >> > > > > > > > What you proposed is exactly the solution I
> > > thought
> > > > > > might
> > > > > > > be
> > > > > > > >> > best
> > > > > > > >> > > > as
> > > > > > > >> > > > > > > well.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > It would be a great task to contribute to the
> > > > > stability
> > > > > > of
> > > > > > > >> our
> > > > > > > >> > > docs
> > > > > > > >> > > > > > > > generation in the future.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > I don't think it's a matter of discussing in
> > > detail
> > > > > how
> > > > > > to
> > > > > > > >> do
> > > > > > > >> > it
> > > > > > > >> > > > (18
> > > > > > > >> > > > > > > months
> > > > > > > >> > > > > > > > is a good start and you can parameterize it),
> > It's
> > > > the
> > > > > > > >> matter
> > > > > > > >> > of
> > > > > > > >> > > > > > > > someone committing to it and doing it simply
> :).
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > So yes I personally am all for it and if I
> > > > understand
> > > > > > > >> correctly
> > > > > > > >> > > > that
> > > > > > > >> > > > > > you
> > > > > > > >> > > > > > > > are looking for agreement on doing it, big +1
> > from
> > > > my
> > > > > > > side -
> > > > > > > >> > > happy
> > > > > > > >> > > > to
> > > > > > > >> > > > > > > help
> > > > > > > >> > > > > > > > with providing access to our S3 buckets.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > J.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > On Thu, Oct 19, 2023 at 5:39 AM Ryan Hatter
> > > > > > > >> > > > > > > > <ryan.hat...@astronomer.io.invalid> wrote:
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > > *tl;dr*
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > >    1. The GitHub Action for building docs is
> > > > running
> > > > > > out
> > > > > > > >> of
> > > > > > > >> > > > space.
> > > > > > > >> > > > > I
> > > > > > > >> > > > > > > > think
> > > > > > > >> > > > > > > > >    we should archive really old
> documentation
> > > for
> > > > > > large
> > > > > > > >> > > packages
> > > > > > > >> > > > to
> > > > > > > >> > > > > > > cloud
> > > > > > > >> > > > > > > > >    storage.
> > > > > > > >> > > > > > > > >    2. Contributing to and building Airflow
> > docs
> > > is
> > > > > > hard.
> > > > > > > >> We
> > > > > > > >> > > > should
> > > > > > > >> > > > > > > > migrate
> > > > > > > >> > > > > > > > >    to a framework, preferably one that uses
> > > > markdown
> > > > > > > >> > (although
> > > > > > > >> > > I
> > > > > > > >> > > > > > > > > acknowledge
> > > > > > > >> > > > > > > > >    rst -> md will be a massive overhaul).
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > *Problem Summary*
> > > > > > > >> > > > > > > > > I recently set out to implement what I
> thought
> > > > would
> > > > > > be
> > > > > > > a
> > > > > > > >> > > > > > > straightforward
> > > > > > > >> > > > > > > > > feature: warn users when they are viewing
> > > > > > documentation
> > > > > > > >> for
> > > > > > > >> > > > > > non-current
> > > > > > > >> > > > > > > > > versions of Airflow and link them to the
> > > > > > current/stable
> > > > > > > >> > version
> > > > > > > >> > > > > > > > > <
> https://github.com/apache/airflow/pull/34639
> > >.
> > > > Jed
> > > > > > > >> pointed
> > > > > > > >> > me
> > > > > > > >> > > > to
> > > > > > > >> > > > > > the
> > > > > > > >> > > > > > > > > airflow-site <
> > > > > https://github.com/apache/airflow-site>
> > > > > > > >> repo,
> > > > > > > >> > > > which
> > > > > > > >> > > > > > > > contains
> > > > > > > >> > > > > > > > > all of the archived docs (that is,
> > documentation
> > > > for
> > > > > > > >> > > non-current
> > > > > > > >> > > > > > > > versions),
> > > > > > > >> > > > > > > > > and from there, I ran into a brick wall.
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > I want to raise some concerns that I've
> > > developed
> > > > > > after
> > > > > > > >> > trying
> > > > > > > >> > > to
> > > > > > > >> > > > > > > > > contribute what feel like a couple
> reasonably
> > > > small
> > > > > > docs
> > > > > > > >> > > updates:
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > >    1. airflow-site
> > > > > > > >> > > > > > > > >       1. Elad pointed out the problem posed
> by
> > > the
> > > > > > sheer
> > > > > > > >> size
> > > > > > > >> > > of
> > > > > > > >> > > > > > > archived
> > > > > > > >> > > > > > > > >       docs
> > > > > > > >> > > > > > > > >       <
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://apache-airflow.slack.com/archives/CCPRP7943/p1697009000242369?thread_ts=1696973512.004229&cid=CCPRP7943
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > (more
> > > > > > > >> > > > > > > > >       on this later).
> > > > > > > >> > > > > > > > >       2. The airflow-site repo is confusing,
> > and
> > > > > > rather
> > > > > > > >> > poorly
> > > > > > > >> > > > > > > > documented.
> > > > > > > >> > > > > > > > >          1. Hugo (static site generator)
> > exists,
> > > > but
> > > > > > > >> appears
> > > > > > > >> > to
> > > > > > > >> > > > > only
> > > > > > > >> > > > > > be
> > > > > > > >> > > > > > > > >          used for the landing pages
> > > > > > > >> > > > > > > > >          2. In order to view any
> documentation
> > > > > locally
> > > > > > > >> other
> > > > > > > >> > > than
> > > > > > > >> > > > > the
> > > > > > > >> > > > > > > > >          landing pages, you'll need to run
> the
> > > > > site.sh
> > > > > > > >> script
> > > > > > > >> > > > then
> > > > > > > >> > > > > > > > > copy the output
> > > > > > > >> > > > > > > > >          from one dir to another?
> > > > > > > >> > > > > > > > >       3. All of the archived docs are raw
> > HTML,
> > > > > making
> > > > > > > >> > > migrating
> > > > > > > >> > > > > to a
> > > > > > > >> > > > > > > > >       static site generator a significant
> > > > challenge,
> > > > > > > which
> > > > > > > >> > > makes
> > > > > > > >> > > > it
> > > > > > > >> > > > > > > > > difficult to
> > > > > > > >> > > > > > > > >       prevent the archived docs from
> > continuing
> > > to
> > > > > > grow
> > > > > > > >> and
> > > > > > > >> > > grow.
> > > > > > > >> > > > > > > > > Perhaps this is the
> > > > > > > >> > > > > > > > >       wheel Khaleesi was referring to
> > > > > > > >> > > > > > > > >       <
> > > > https://www.youtube.com/watch?v=J-rxmk6zPxA
> > > > > >?
> > > > > > > >> > > > > > > > >    2. airflow
> > > > > > > >> > > > > > > > >       1. Building Airflow docs is a
> challenge.
> > > It
> > > > > > takes
> > > > > > > >> > several
> > > > > > > >> > > > > > minutes
> > > > > > > >> > > > > > > > and
> > > > > > > >> > > > > > > > >       doesn't support auto-build, so the
> > > slightest
> > > > > > issue
> > > > > > > >> > could
> > > > > > > >> > > > > > require
> > > > > > > >> > > > > > > > > waiting
> > > > > > > >> > > > > > > > >       again and again until the changes are
> > just
> > > > > so. I
> > > > > > > >> tried
> > > > > > > >> > > > > > > implementing
> > > > > > > >> > > > > > > > >       sphinx-autobuild <
> > > > > > > >> > > > > > > > >
> > > > https://github.com/executablebooks/sphinx-autobuild
> > > > > >
> > > > > > > >> > > > > > > > >       to no avail.
> > > > > > > >> > > > > > > > >       2. Sphinx/restructured text has a
> steep
> > > > > learning
> > > > > > > >> curve.
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > *The most acute issue: disk space*
> > > > > > > >> > > > > > > > > The size of the archived docs is causing the
> > > docs
> > > > > > build
> > > > > > > >> > GitHub
> > > > > > > >> > > > > Action
> > > > > > > >> > > > > > > to
> > > > > > > >> > > > > > > > > almost run out of space. From the "Build
> site"
> > > > > Action
> > > > > > > >> from a
> > > > > > > >> > > > couple
> > > > > > > >> > > > > > > weeks
> > > > > > > >> > > > > > > > > ago
> > > > > > > >> > > > > > > > > <
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/airflow-site/actions/runs/6419529645/job/17432628458
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > (expand
> > > > > > > >> > > > > > > > > the build site step, scroll all the way to
> the
> > > > > bottom,
> > > > > > > >> expand
> > > > > > > >> > > the
> > > > > > > >> > > > > `df
> > > > > > > >> > > > > > > -h`
> > > > > > > >> > > > > > > > > command), we can see the GitHub Action
> runner
> > > (or
> > > > > > > whatever
> > > > > > > >> > it's
> > > > > > > >> > > > > > called)
> > > > > > > >> > > > > > > > is
> > > > > > > >> > > > > > > > > nearly running out of space:
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > df -h
> > > > > > > >> > > > > > > > >   *Filesystem      Size  Used Avail Use%
> > Mounted
> > > > on*
> > > > > > > >> > > > > > > > >   /dev/root        84G   82G  2.1G  98% /
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > The available space is down to 1.8G on the
> > most
> > > > > recent
> > > > > > > >> Action
> > > > > > > >> > > > > > > > > <
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/airflow-site/actions/runs/6564727255/job/17831714176
> > > > > > > >> > > > > > > > > >.
> > > > > > > >> > > > > > > > > If we assume that trend is accurate, we have
> > > about
> > > > > two
> > > > > > > >> months
> > > > > > > >> > > > > before
> > > > > > > >> > > > > > > the
> > > > > > > >> > > > > > > > > Action runner runs out of disk space.
> Here's a
> > > > > > breakdown
> > > > > > > >> of
> > > > > > > >> > the
> > > > > > > >> > > > > space
> > > > > > > >> > > > > > > > > consumed by the 10 largest package
> > documentation
> > > > > > > >> directories:
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > du -h -d 1 docs-archive/ | sort -h -r
> > > > > > > >> > > > > > > > > * 14G* docs-archive/
> > > > > > > >> > > > > > > > > *4.0G*
> > > > docs-archive//apache-airflow-providers-google
> > > > > > > >> > > > > > > > > *3.2G* docs-archive//apache-airflow
> > > > > > > >> > > > > > > > > *1.7G*
> > > > docs-archive//apache-airflow-providers-amazon
> > > > > > > >> > > > > > > > > *560M*
> > > > > > > >> docs-archive//apache-airflow-providers-microsoft-azure
> > > > > > > >> > > > > > > > > *254M*
> > > > > > > >> docs-archive//apache-airflow-providers-cncf-kubernetes
> > > > > > > >> > > > > > > > > *192M*
> > > > > > > docs-archive//apache-airflow-providers-apache-hive
> > > > > > > >> > > > > > > > > *153M*
> > > > > > docs-archive//apache-airflow-providers-snowflake
> > > > > > > >> > > > > > > > > *139M*
> > > > > > docs-archive//apache-airflow-providers-databricks
> > > > > > > >> > > > > > > > > *104M*
> > > > docs-archive//apache-airflow-providers-docker
> > > > > > > >> > > > > > > > > *101M*
> > > > docs-archive//apache-airflow-providers-mysql
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > *Proposed solution: Archive old docs html
> for
> > > > large
> > > > > > > >> packages
> > > > > > > >> > to
> > > > > > > >> > > > > cloud
> > > > > > > >> > > > > > > > > storage*
> > > > > > > >> > > > > > > > > I'm wondering if it would be reasonable to
> > truly
> > > > > > archive
> > > > > > > >> the
> > > > > > > >> > > docs
> > > > > > > >> > > > > for
> > > > > > > >> > > > > > > > some
> > > > > > > >> > > > > > > > > of the older versions of these packages.
> > Perhaps
> > > > the
> > > > > > > last
> > > > > > > >> 18
> > > > > > > >> > > > > months?
> > > > > > > >> > > > > > > > Maybe
> > > > > > > >> > > > > > > > > we could drop the html in a blob storage
> > bucket
> > > > with
> > > > > > > >> > > instructions
> > > > > > > >> > > > > for
> > > > > > > >> > > > > > > > > building the docs if absolutely necessary?
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > *Improving docs building moving forward*
> > > > > > > >> > > > > > > > > There's an open Issue <
> > > > > > > >> > > > > > >
> https://github.com/apache/airflow-site/issues/719
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > for
> > > > > > > >> > > > > > > > > migrating the docs to a framework, but it's
> > not
> > > at
> > > > > > all a
> > > > > > > >> > > > > > > straightforward
> > > > > > > >> > > > > > > > > task for the archived docs. I think that we
> > > should
> > > > > > > >> institute
> > > > > > > >> > a
> > > > > > > >> > > > > policy
> > > > > > > >> > > > > > > of
> > > > > > > >> > > > > > > > > archiving old documentation to cloud storage
> > > > after X
> > > > > > > time
> > > > > > > >> and
> > > > > > > >> > > > use a
> > > > > > > >> > > > > > > > > framework for building docs in a scalable
> and
> > > > > > > sustainable
> > > > > > > >> way
> > > > > > > >> > > > > moving
> > > > > > > >> > > > > > > > > forward. Maybe we could chat with iceberg
> > folks
> > > > > about
> > > > > > > how
> > > > > > > >> > they
> > > > > > > >> > > > > moved
> > > > > > > >> > > > > > > from
> > > > > > > >> > > > > > > > > mkdocs to hugo? <
> > > > > > > >> > https://github.com/apache/iceberg/issues/3616
> > > > > > > >> > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > Shoutout to Utkarsh for helping me through
> all
> > > > this!
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to