Go ahead Utkarsh. It would be nice to work with you along this. Thanks, Amogh Desai
On Wed, Oct 25, 2023 at 10:02 PM Jarek Potiuk <ja...@potiuk.com> wrote: > +1. I think no-one will object to improve the current situation :) > > On Wed, Oct 25, 2023 at 5:02 PM utkarsh sharma <utkarshar...@gmail.com> > wrote: > > > Hey everyone, > > > > If we have a consensus on the suggestions in my previous email, I would > > like to subdivide the task into smaller tickets and distribute them among > > Aritra Basu, Amogh Desai, and myself. > > > > Thanks, > > Utkarsh Sharma > > > > On Tue, Oct 24, 2023 at 10:12 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > Those look like great ideas. > > > > > > On Tue, Oct 24, 2023 at 4:23 PM utkarsh sharma <utkarshar...@gmail.com > > > > > wrote: > > > > > > > Just forgot to mention in my previous mail, that I'm suggesting the > > above > > > > changes since the storage is not the primary concern right now but > I'm > > > > happy to contribute either way. :) > > > > > > > > On Tue, Oct 24, 2023 at 7:43 PM utkarsh sharma < > utkarshar...@gmail.com > > > > > > > wrote: > > > > > > > > > Hey everyone, > > > > > > > > > > I have a couple of tasks in mind, that might aid in reducing the > > > efforts > > > > > while working with docs. Right now tasks listed below are difficult > > to > > > > > achieve. > > > > > > > > > > 1. Adding a warning based on a specific provider/version of a > > > > > provider/range of providers. Which was also the task that Ryan was > > > > working > > > > > on. > > > > > 2. Altering a page layout or CSS for a specific provider. > > > > > > > > > > The issue while trying to achieve the above tasks is because of the > > > > > pre-prepared static files we get as a final product of building > > > documents > > > > > with *breeze build-docs* in folder docs/_build. The files we get > are > > > > > self-sufficient to be hosted and they are really just used directly > > > > leaving > > > > > no room for customization of any sort. > > > > > > > > > > > > > > > My proposal would be to break down this process as follows: > > > > > > > > > > 1. We can prepare partial documents as part of *breeze build-docs* > > > which > > > > > are only responsible for providing HTML to be populated within the > > Body > > > > tag > > > > > for a specific provider, and not the layout of the entire page. > > > > > 2. We then copy partial static files to the Airflow-site repo > within > > > > > landing pages/site/layouts/docs. Where the layout of the page will > be > > > > > provided by `single.html`, a listing of all the providers will be > > > > provided > > > > > by `list.html`, which are standard hugo > > > > > <https://gohugo.io/about/what-is-hugo/> features. Also, using > static > > > > > files from `sphinx_airflow_theme` which lives in the same repo, > makes > > > the > > > > > changes on the CSS easy. > > > > > 3. We can then use Hugo to generate static > > > > > <https://gohugo.io/getting-started/quick-start/#publish-the-site> > > > files > > > > > and push them to the `gh-pages` branch to publish them using GitHub > > > > pages. > > > > > > > > > > > > > > > Doing the above changes will enable us to do the following: > > > > > > > > > > 1. Will give us more control to work on a specific > > > > > provider/provider-version if we want by providing templates - > > > > > https://gohugo.io/templates/lookup-order/ > > > > > 2. We will have a specific code to look at depending on the changes > > one > > > > > intends to make, right now if you don't know the flow it's a bit > > > > difficult > > > > > to pinpoint the code to change. > > > > > 1. If we want to make changes to a specific provider's content we > can > > > do > > > > > it Airflow's repo docs/<provider>/*.rst file. > > > > > 2. If we have a change that affects multiple providers or versions > we > > > can > > > > > do it on Airflow Website's repo. > > > > > > > > > > > > > > > Thanks, > > > > > Utkarsh Sharma > > > > > > > > > > On Tue, Oct 24, 2023 at 3:45 PM Jarek Potiuk <ja...@potiuk.com> > > wrote: > > > > > > > > > >> So it looks like we have some helping hands and we need someone to > > > lead > > > > it > > > > >> :) (just saying). > > > > >> > > > > >> On Tue, Oct 24, 2023 at 8:15 AM Amogh Desai < > > amoghdesai....@gmail.com > > > > > > > > >> wrote: > > > > >> > > > > >> > +1 (non binding) from me on the thought of moving the older docs > > > (~18 > > > > >> > months seems ok) to an archive instead of the repository. > > > > >> > > > > > >> > Coming to the other problem of copying the built docs into > > > > airflow-site > > > > >> for > > > > >> > releases, maybe we can fix that using a script? Open for > thoughts > > > > here. > > > > >> > > > > > >> > I would be very happy to help when we start taking this > forward, I > > > > have > > > > >> > some experience in airflow-site and docs side as well. Feel free > > to > > > > >> reach > > > > >> > out over email or slack :) > > > > >> > > > > > >> > Thanks & Regards, > > > > >> > Amogh Desai > > > > >> > > > > > >> > On Mon, Oct 23, 2023 at 3:08 AM Aritra Basu < > > > aritrabasu1...@gmail.com > > > > > > > > > >> > wrote: > > > > >> > > > > > >> > > This definitely sounds like something that needs doing sooner > > > rather > > > > >> than > > > > >> > > later. > > > > >> > > > > > > >> > > While I'd love to help, I'm not too experienced with this area > > so > > > I > > > > >> might > > > > >> > > not be able to actually propose what changes need doing, but > if > > > > >> someone > > > > >> > has > > > > >> > > a path forward on this I can definitely contribute some time > to > > > help > > > > >> out > > > > >> > > given some guidance on what is needed. > > > > >> > > > > > > >> > > -- > > > > >> > > Regards, > > > > >> > > Aritra Basu > > > > >> > > > > > > >> > > On Mon, Oct 23, 2023, 2:19 AM Jarek Potiuk <ja...@potiuk.com> > > > > wrote: > > > > >> > > > > > > >> > > > Some news here. > > > > >> > > > > > > > >> > > > I caught up with some infra changes that happened while I > was > > > > >> > travelling > > > > >> > > - > > > > >> > > > and I have just (with > > > > >> https://github.com/apache/airflow-site/pull/879) > > > > >> > > > switched the "airflow-site" building to the new, self-hosted > > > > >> > > "asf-runners". > > > > >> > > > This is a new option that ASF infra has given to test for > the > > > ASF > > > > >> > > projects > > > > >> > > > - rather than relying on "public runners", we can switch to > > > > >> self-hosted > > > > >> > > > runners donated by Microsoft to the ASF. More info here: > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?spaceKey=INFRA&title=ASF+Infra+provided+self-hosted+runners > > > > >> > > > > > > > >> > > > The most important result is that we now have a lot more > > > > "breathing > > > > >> > > space" > > > > >> > > > for the docs building job. During the build we are using max > > 59% > > > > of > > > > >> the > > > > >> > > > disk space - with 73GB used and 52GB free. > > > > >> > > > > > > > >> > > > Filesystem Size Used Avail Use% Mounted on > > > > >> > > > overlay 124G 73G 52G 59% / > > > > >> > > > > > > > >> > > > This is - on one hand - good news (disk space is not an > > "acute" > > > > >> issue > > > > >> > any > > > > >> > > > more), I think if someone would like to work on improving > the > > > docs > > > > >> > > building > > > > >> > > > of ours, they have much more breathing space to do so. > > > > >> > > > But - clearly - it might mean that the incentive to work on > it > > > > >> > decreased > > > > >> > > - > > > > >> > > > because it "just works"). That's the bad effect of it. And I > > > think > > > > >> it's > > > > >> > > not > > > > >> > > > good, though the most I can do is to reiterate Ryan's > concerns > > > and > > > > >> hope > > > > >> > > we > > > > >> > > > will get someone committing to improving this. > > > > >> > > > > > > > >> > > > I would strongly encourage those who want to improve it, to > do > > > > so. I > > > > >> > > think > > > > >> > > > - as Ryan stated - contributing to our docs is more complex > > than > > > > it > > > > >> > > should > > > > >> > > > be and anyone who would like to contribute there is most > > > welcome. > > > > I > > > > >> > very > > > > >> > > > much share all the points that Ryan made and I think we > should > > > > >> welcome > > > > >> > > any > > > > >> > > > efforts to make it better. The lack of > incremental/auto-build > > > > >> support > > > > >> > is > > > > >> > > > especially troublesome for anyone who wants to contribute > > their > > > > >> docs. > > > > >> > > Happy > > > > >> > > > to help anyone who would like to take on the task. > > > > >> > > > > > > > >> > > > Still - if we would like to move old docs outside as a first > > > step, > > > > >> I am > > > > >> > > > happy to help anyone who would like to commit to doing it. > > > > >> > > > > > > > >> > > > J. > > > > >> > > > > > > > >> > > > On Fri, Oct 20, 2023 at 3:27 PM Pierre Jeambrun < > > > > >> pierrejb...@gmail.com > > > > >> > > > > > > >> > > > wrote: > > > > >> > > > > > > > >> > > > > +1 from moving archived docs outside of airflow-site. > > > > >> > > > > > > > > >> > > > > Even if that might mean a little more maintenance in case > we > > > > need > > > > >> to > > > > >> > > > > propagate changes to all historical versions, we would > have > > to > > > > >> > handle 2 > > > > >> > > > > repositories, but that seems like a minor downside > compared > > to > > > > the > > > > >> > > > quality > > > > >> > > > > of life improvement that it would bring for airflow-site > > > > >> > contributions. > > > > >> > > > > > > > > >> > > > > Le jeu. 19 oct. 2023 à 16:11, Jarek Potiuk < > > ja...@potiuk.com> > > > a > > > > >> > écrit > > > > >> > > : > > > > >> > > > > > > > > >> > > > > > Let me just clarify (because that could be unclear) what > > my > > > +1 > > > > >> was > > > > >> > > > about. > > > > >> > > > > > > > > > >> > > > > > I was not talking (and I believe Ryan was not talking > > > either) > > > > >> about > > > > >> > > > > > removing the old docs but about archiving them and > serving > > > > from > > > > >> > > > elsewhere > > > > >> > > > > > (cloud storage). > > > > >> > > > > > > > > > >> > > > > > I think discussing changing to more shared HTML/JS/CSS > is > > > > also a > > > > >> > good > > > > >> > > > > idea > > > > >> > > > > > to optimise it, but possibly can be handled separately > as > > a > > > > >> longer > > > > >> > > > effort > > > > >> > > > > > of redesigning how the docs are built. But by all means > we > > > > could > > > > >> > also > > > > >> > > > > work > > > > >> > > > > > on that. > > > > >> > > > > > > > > > >> > > > > > Maybe I jumped to conclusions, but the easiest, tactical > > > > >> solution > > > > >> > > (for > > > > >> > > > > the > > > > >> > > > > > most acute issue - size) is we just move the old > generated > > > > HTML > > > > >> > docs > > > > >> > > > from > > > > >> > > > > > the git repository of "airflow-site" and in the > > > "github_pages" > > > > >> > branch > > > > >> > > > we > > > > >> > > > > > replace it with redirecting of those pages to the files > > > served > > > > >> from > > > > >> > > the > > > > >> > > > > > cloud storage (and I believe this is what Ryan hinted > at). > > > > >> > > > > > > > > > >> > > > > > Those redirects could be automatically generated for all > > > > >> > > > > > historical versions and they will be small. We are > > already > > > > >> doing > > > > >> > it > > > > >> > > > for > > > > >> > > > > > individual pages for navigating between versions, but we > > > could > > > > >> > easily > > > > >> > > > > > replace all the historical docs with "<html><head><meta > > > > >> > > > > > http-equiv="refresh" content="0; url= > > > > >> > > > > > > > > > >> https://new-archive-docs-airflow-url/airflow/version/document.url > " > > > > >> > > > > > "/></head></html>". Low-tech, surely and "legacy", but > it > > > will > > > > >> > solve > > > > >> > > > the > > > > >> > > > > > size problem instantly. We currently have 115.148 such > > files > > > > >> which > > > > >> > > will > > > > >> > > > > go > > > > >> > > > > > down to about ~20 MB of files which is peanuts, compared > > to > > > > the > > > > >> > > current > > > > >> > > > > > 17GB (!) we have. > > > > >> > > > > > > > > > >> > > > > > We can also inject into the moved "storage" docs, the > > header > > > > >> that > > > > >> > > > informs > > > > >> > > > > > that this is an old/archived documentation with single > > > > redirect > > > > >> to > > > > >> > > > > > "live"/"stable" site for newer versions of docs (which I > > > > believe > > > > >> > > > sparked > > > > >> > > > > > Ryan's work). This can be done at least as the "quick" > > > > >> remediation > > > > >> > > for > > > > >> > > > > the > > > > >> > > > > > size issue and something that might allow the current > > scheme > > > > to > > > > >> > > > > > work without ever-growing repo/size and using space for > > the > > > > >> build > > > > >> > > > action. > > > > >> > > > > > If we have such an automated mechanism in place, we > could > > > > >> > > periodically > > > > >> > > > > > archive old docs. All that without changing the build > > > process > > > > of > > > > >> > ours > > > > >> > > > and > > > > >> > > > > > simply keep old "past" docs elsewhere (still accessible > > for > > > > >> users). > > > > >> > > > > > > > > > >> > > > > > Not much should change for the users IMHO - if they go > to > > > the > > > > >> old > > > > >> > > > version > > > > >> > > > > > of the docs or use old, archived URLs, they would end up > > > > seeing > > > > >> the > > > > >> > > > > > same content/navigation they see today (with extra > > > information > > > > >> it's > > > > >> > > an > > > > >> > > > > old > > > > >> > > > > > version and served from a different URL). > > > > >> > > > > > When they go to the "old" version of documentation they > > > could > > > > be > > > > >> > > > > redirected > > > > >> > > > > > to the new one - same HTML but hosted on cloud storage, > > > fully > > > > >> > > > statically. > > > > >> > > > > > We already do that with "redirect" mechanism. > > > > >> > > > > > > > > > >> > > > > > In the meantime, someone could also work on a strategic > > > > >> solution - > > > > >> > > and > > > > >> > > > > > changing the current build process, but this is - I > think > > a > > > > >> > > different - > > > > >> > > > > > and much more complex and requiring a lot of effort - > > step. > > > > And > > > > >> it > > > > >> > > > could > > > > >> > > > > > simply end up with regenerating whatever is left as > "live" > > > > >> > > > documentation > > > > >> > > > > > (leaving the archive docs intact). > > > > >> > > > > > > > > > >> > > > > > That's at least what I see as a possible set of steps to > > > take. > > > > >> > > > > > > > > > >> > > > > > J. > > > > >> > > > > > > > > > >> > > > > > On Thu, Oct 19, 2023 at 2:14 PM utkarsh sharma < > > > > >> > > utkarshar...@gmail.com > > > > >> > > > > > > > > >> > > > > > wrote: > > > > >> > > > > > > > > > >> > > > > > > Hey everyone, > > > > >> > > > > > > > > > > >> > > > > > > Thanks, Ryan for stating the thread :) > > > > >> > > > > > > > > > > >> > > > > > > Big +1 For archiving docs older than 18 months. We can > > > still > > > > >> make > > > > >> > > the > > > > >> > > > > > older > > > > >> > > > > > > docs available in `rst` doc form. > > > > >> > > > > > > > > > > >> > > > > > > But eventually, we might again run into this problem > > > because > > > > >> of > > > > >> > the > > > > >> > > > > > growing > > > > >> > > > > > > no. of providers. I think the main reason for this > issue > > > is > > > > >> the > > > > >> > > > > generated > > > > >> > > > > > > static HTML pages and the way we cater to them using > > > GitHub > > > > >> > Pages. > > > > >> > > > The > > > > >> > > > > > > generated pages have lots of common code > > > > >> > > > > > > HTML(headers/navigation/breadcrumbs/footer etc.) CSS, > JS > > > > >> which is > > > > >> > > > > > repeated > > > > >> > > > > > > for every provider and every version of that provider. > > If > > > we > > > > >> > have a > > > > >> > > > > more > > > > >> > > > > > > dynamic way(Django/Flask Servers) of catering the > > > documents > > > > we > > > > >> > can > > > > >> > > > save > > > > >> > > > > > all > > > > >> > > > > > > the space for common HTML/CSS/JS. > > > > >> > > > > > > > > > > >> > > > > > > But the downsides of this approach are: > > > > >> > > > > > > 1. We need to have a server > > > > >> > > > > > > 2. Also require changes in the existing document build > > > > >> process to > > > > >> > > > only > > > > >> > > > > > > produce partial HTML documents. > > > > >> > > > > > > > > > > >> > > > > > > Thanks, > > > > >> > > > > > > Utkarsh Sharma > > > > >> > > > > > > > > > > >> > > > > > > On Thu, Oct 19, 2023 at 4:08 PM Jarek Potiuk < > > > > >> ja...@potiuk.com> > > > > >> > > > wrote: > > > > >> > > > > > > > > > > >> > > > > > > > Yes. Moving the old version to somewhere that we can > > > > >> > keep/archive > > > > >> > > > > > static > > > > >> > > > > > > > historical versions of those historical docs and > > publish > > > > >> them > > > > >> > > from > > > > >> > > > > > there. > > > > >> > > > > > > > What you proposed is exactly the solution I thought > > > might > > > > be > > > > >> > best > > > > >> > > > as > > > > >> > > > > > > well. > > > > >> > > > > > > > > > > > >> > > > > > > > It would be a great task to contribute to the > > stability > > > of > > > > >> our > > > > >> > > docs > > > > >> > > > > > > > generation in the future. > > > > >> > > > > > > > > > > > >> > > > > > > > I don't think it's a matter of discussing in detail > > how > > > to > > > > >> do > > > > >> > it > > > > >> > > > (18 > > > > >> > > > > > > months > > > > >> > > > > > > > is a good start and you can parameterize it), It's > the > > > > >> matter > > > > >> > of > > > > >> > > > > > > > someone committing to it and doing it simply :). > > > > >> > > > > > > > > > > > >> > > > > > > > So yes I personally am all for it and if I > understand > > > > >> correctly > > > > >> > > > that > > > > >> > > > > > you > > > > >> > > > > > > > are looking for agreement on doing it, big +1 from > my > > > > side - > > > > >> > > happy > > > > >> > > > to > > > > >> > > > > > > help > > > > >> > > > > > > > with providing access to our S3 buckets. > > > > >> > > > > > > > > > > > >> > > > > > > > J. > > > > >> > > > > > > > > > > > >> > > > > > > > On Thu, Oct 19, 2023 at 5:39 AM Ryan Hatter > > > > >> > > > > > > > <ryan.hat...@astronomer.io.invalid> wrote: > > > > >> > > > > > > > > > > > >> > > > > > > > > *tl;dr* > > > > >> > > > > > > > > > > > > >> > > > > > > > > 1. The GitHub Action for building docs is > running > > > out > > > > >> of > > > > >> > > > space. > > > > >> > > > > I > > > > >> > > > > > > > think > > > > >> > > > > > > > > we should archive really old documentation for > > > large > > > > >> > > packages > > > > >> > > > to > > > > >> > > > > > > cloud > > > > >> > > > > > > > > storage. > > > > >> > > > > > > > > 2. Contributing to and building Airflow docs is > > > hard. > > > > >> We > > > > >> > > > should > > > > >> > > > > > > > migrate > > > > >> > > > > > > > > to a framework, preferably one that uses > markdown > > > > >> > (although > > > > >> > > I > > > > >> > > > > > > > > acknowledge > > > > >> > > > > > > > > rst -> md will be a massive overhaul). > > > > >> > > > > > > > > > > > > >> > > > > > > > > *Problem Summary* > > > > >> > > > > > > > > I recently set out to implement what I thought > would > > > be > > > > a > > > > >> > > > > > > straightforward > > > > >> > > > > > > > > feature: warn users when they are viewing > > > documentation > > > > >> for > > > > >> > > > > > non-current > > > > >> > > > > > > > > versions of Airflow and link them to the > > > current/stable > > > > >> > version > > > > >> > > > > > > > > <https://github.com/apache/airflow/pull/34639>. > Jed > > > > >> pointed > > > > >> > me > > > > >> > > > to > > > > >> > > > > > the > > > > >> > > > > > > > > airflow-site < > > https://github.com/apache/airflow-site> > > > > >> repo, > > > > >> > > > which > > > > >> > > > > > > > contains > > > > >> > > > > > > > > all of the archived docs (that is, documentation > for > > > > >> > > non-current > > > > >> > > > > > > > versions), > > > > >> > > > > > > > > and from there, I ran into a brick wall. > > > > >> > > > > > > > > > > > > >> > > > > > > > > I want to raise some concerns that I've developed > > > after > > > > >> > trying > > > > >> > > to > > > > >> > > > > > > > > contribute what feel like a couple reasonably > small > > > docs > > > > >> > > updates: > > > > >> > > > > > > > > > > > > >> > > > > > > > > 1. airflow-site > > > > >> > > > > > > > > 1. Elad pointed out the problem posed by the > > > sheer > > > > >> size > > > > >> > > of > > > > >> > > > > > > archived > > > > >> > > > > > > > > docs > > > > >> > > > > > > > > < > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > https://apache-airflow.slack.com/archives/CCPRP7943/p1697009000242369?thread_ts=1696973512.004229&cid=CCPRP7943 > > > > >> > > > > > > > > > > > > > >> > > > > > > > > (more > > > > >> > > > > > > > > on this later). > > > > >> > > > > > > > > 2. The airflow-site repo is confusing, and > > > rather > > > > >> > poorly > > > > >> > > > > > > > documented. > > > > >> > > > > > > > > 1. Hugo (static site generator) exists, > but > > > > >> appears > > > > >> > to > > > > >> > > > > only > > > > >> > > > > > be > > > > >> > > > > > > > > used for the landing pages > > > > >> > > > > > > > > 2. In order to view any documentation > > locally > > > > >> other > > > > >> > > than > > > > >> > > > > the > > > > >> > > > > > > > > landing pages, you'll need to run the > > site.sh > > > > >> script > > > > >> > > > then > > > > >> > > > > > > > > copy the output > > > > >> > > > > > > > > from one dir to another? > > > > >> > > > > > > > > 3. All of the archived docs are raw HTML, > > making > > > > >> > > migrating > > > > >> > > > > to a > > > > >> > > > > > > > > static site generator a significant > challenge, > > > > which > > > > >> > > makes > > > > >> > > > it > > > > >> > > > > > > > > difficult to > > > > >> > > > > > > > > prevent the archived docs from continuing to > > > grow > > > > >> and > > > > >> > > grow. > > > > >> > > > > > > > > Perhaps this is the > > > > >> > > > > > > > > wheel Khaleesi was referring to > > > > >> > > > > > > > > < > https://www.youtube.com/watch?v=J-rxmk6zPxA > > >? > > > > >> > > > > > > > > 2. airflow > > > > >> > > > > > > > > 1. Building Airflow docs is a challenge. It > > > takes > > > > >> > several > > > > >> > > > > > minutes > > > > >> > > > > > > > and > > > > >> > > > > > > > > doesn't support auto-build, so the slightest > > > issue > > > > >> > could > > > > >> > > > > > require > > > > >> > > > > > > > > waiting > > > > >> > > > > > > > > again and again until the changes are just > > so. I > > > > >> tried > > > > >> > > > > > > implementing > > > > >> > > > > > > > > sphinx-autobuild < > > > > >> > > > > > > > > > https://github.com/executablebooks/sphinx-autobuild > > > > > > > >> > > > > > > > > to no avail. > > > > >> > > > > > > > > 2. Sphinx/restructured text has a steep > > learning > > > > >> curve. > > > > >> > > > > > > > > > > > > >> > > > > > > > > *The most acute issue: disk space* > > > > >> > > > > > > > > The size of the archived docs is causing the docs > > > build > > > > >> > GitHub > > > > >> > > > > Action > > > > >> > > > > > > to > > > > >> > > > > > > > > almost run out of space. From the "Build site" > > Action > > > > >> from a > > > > >> > > > couple > > > > >> > > > > > > weeks > > > > >> > > > > > > > > ago > > > > >> > > > > > > > > < > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > https://github.com/apache/airflow-site/actions/runs/6419529645/job/17432628458 > > > > >> > > > > > > > > > > > > > >> > > > > > > > > (expand > > > > >> > > > > > > > > the build site step, scroll all the way to the > > bottom, > > > > >> expand > > > > >> > > the > > > > >> > > > > `df > > > > >> > > > > > > -h` > > > > >> > > > > > > > > command), we can see the GitHub Action runner (or > > > > whatever > > > > >> > it's > > > > >> > > > > > called) > > > > >> > > > > > > > is > > > > >> > > > > > > > > nearly running out of space: > > > > >> > > > > > > > > > > > > >> > > > > > > > > df -h > > > > >> > > > > > > > > *Filesystem Size Used Avail Use% Mounted > on* > > > > >> > > > > > > > > /dev/root 84G 82G 2.1G 98% / > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > The available space is down to 1.8G on the most > > recent > > > > >> Action > > > > >> > > > > > > > > < > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > https://github.com/apache/airflow-site/actions/runs/6564727255/job/17831714176 > > > > >> > > > > > > > > >. > > > > >> > > > > > > > > If we assume that trend is accurate, we have about > > two > > > > >> months > > > > >> > > > > before > > > > >> > > > > > > the > > > > >> > > > > > > > > Action runner runs out of disk space. Here's a > > > breakdown > > > > >> of > > > > >> > the > > > > >> > > > > space > > > > >> > > > > > > > > consumed by the 10 largest package documentation > > > > >> directories: > > > > >> > > > > > > > > > > > > >> > > > > > > > > du -h -d 1 docs-archive/ | sort -h -r > > > > >> > > > > > > > > * 14G* docs-archive/ > > > > >> > > > > > > > > *4.0G* > docs-archive//apache-airflow-providers-google > > > > >> > > > > > > > > *3.2G* docs-archive//apache-airflow > > > > >> > > > > > > > > *1.7G* > docs-archive//apache-airflow-providers-amazon > > > > >> > > > > > > > > *560M* > > > > >> docs-archive//apache-airflow-providers-microsoft-azure > > > > >> > > > > > > > > *254M* > > > > >> docs-archive//apache-airflow-providers-cncf-kubernetes > > > > >> > > > > > > > > *192M* > > > > docs-archive//apache-airflow-providers-apache-hive > > > > >> > > > > > > > > *153M* > > > docs-archive//apache-airflow-providers-snowflake > > > > >> > > > > > > > > *139M* > > > docs-archive//apache-airflow-providers-databricks > > > > >> > > > > > > > > *104M* > docs-archive//apache-airflow-providers-docker > > > > >> > > > > > > > > *101M* > docs-archive//apache-airflow-providers-mysql > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > *Proposed solution: Archive old docs html for > large > > > > >> packages > > > > >> > to > > > > >> > > > > cloud > > > > >> > > > > > > > > storage* > > > > >> > > > > > > > > I'm wondering if it would be reasonable to truly > > > archive > > > > >> the > > > > >> > > docs > > > > >> > > > > for > > > > >> > > > > > > > some > > > > >> > > > > > > > > of the older versions of these packages. Perhaps > the > > > > last > > > > >> 18 > > > > >> > > > > months? > > > > >> > > > > > > > Maybe > > > > >> > > > > > > > > we could drop the html in a blob storage bucket > with > > > > >> > > instructions > > > > >> > > > > for > > > > >> > > > > > > > > building the docs if absolutely necessary? > > > > >> > > > > > > > > > > > > >> > > > > > > > > *Improving docs building moving forward* > > > > >> > > > > > > > > There's an open Issue < > > > > >> > > > > > > https://github.com/apache/airflow-site/issues/719 > > > > >> > > > > > > > > > > > > >> > > > > > > > > for > > > > >> > > > > > > > > migrating the docs to a framework, but it's not at > > > all a > > > > >> > > > > > > straightforward > > > > >> > > > > > > > > task for the archived docs. I think that we should > > > > >> institute > > > > >> > a > > > > >> > > > > policy > > > > >> > > > > > > of > > > > >> > > > > > > > > archiving old documentation to cloud storage > after X > > > > time > > > > >> and > > > > >> > > > use a > > > > >> > > > > > > > > framework for building docs in a scalable and > > > > sustainable > > > > >> way > > > > >> > > > > moving > > > > >> > > > > > > > > forward. Maybe we could chat with iceberg folks > > about > > > > how > > > > >> > they > > > > >> > > > > moved > > > > >> > > > > > > from > > > > >> > > > > > > > > mkdocs to hugo? < > > > > >> > https://github.com/apache/iceberg/issues/3616 > > > > >> > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > Shoutout to Utkarsh for helping me through all > this! > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > >