potiuk commented on code in PR #50464: URL: https://github.com/apache/airflow/pull/50464#discussion_r2084979151
########## docs/README.md: ########## @@ -37,3 +54,196 @@ Documentation for general overview and summaries not connected with any specific * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page + +# Architecture of documentation for Airflow + +Building documentation for Airflow is optimized for speed and for convenience workflows of the release +managers and committers who publish and fix the documentation - that's why it's a little complex, as we have +multiple repositories and multiple sources of the documentation involved. + +There are few repositories under `apache` organization which are used to build the documentation for Airflow: + +* `apache-airflow` - the repository with the code and the documentation sources for Airflow distributions, + provider distributions, providers summary and docker summary: [apache-airflow](https://github.com/apache/airflow) + from here we publish the documentation to S3 bucket where the documentation is hosted. +* `airflow-site` - the repository with the website theme and content where we keep sources of the website + structure, navigation, theme for the website [airflow-site](https://github.com/apache/airflow). From here + we publish the website to the ASF servers so they are publish as the [official website](https://airflow.apache.org) +* `airflow-site-archive` - here we keep the archived historical versions of the generated documentation + of all the documentation packages that we keep on S3. This repository is automatically synchronized from + the S3 buckets and is only used in case we need to perform a bulk update of historical documentation. Here only + generated `html`, `css`, `js` and `images` files are kept, no sources of the documentation are kept here. + +We have two S3 buckets where we can publish the documentation generated from `apache-airflow` repository: + +* `s3://live-docs-airflow-apache-org/docs/` - live, [official documentation](https://airflow.apache.org/docs/) +* `s3://staging-docs-airflow-apache-org/docs/` - staging documentation [official documentation](https://staging-airflow.apache.org/docs/) TODO: make it work + +# Diagrams of the documentation architecture + +This is the diagram of live documentation architecture: + + + +Staging documentation architecture is similar, but uses staging bucket and staging Apache Website. + +# Typical workflows + +There are a few typical workflows that we support: + +## Publishing the documentation by the release manager + +The release manager publishes the documentation using GitHub Actions workflow +[Publish Docs to S3](https://github.com/apache/airflow/actions/workflows/publish-docs-to-s3.yml). +The same workflow can be used to publish Airflow, Helm chart and providers documentation. + +The person who triggers the build (release manager) should specify the tag name of the docs to be published +and the list of documentation packages to be published. Usually it is: + +* Airflow: `apache-airflow docker-stack` (later we will add `airflow-ctl` and `task-sdk`) +* Helm chart: `helm-chart` +* Providers: `provider_id1 provider_id2` or `all providers` if all providers should be published. + +Optionally - specifically if we run `all-providers` and release manager wants to exclude some providers, +they can specify documentation packages to exclude. Leaving "no-docs-excluded" will publish all packages +specified to be published without exclusions. + +You can also specify whether "live" or "staging" documentation should be published. The default is "live". + +Example screenshot of the workflow triggered from the GitHub UI: + + + +Note that this just publishes the documentation but does not update the "site" with version numbers or +stable links to providers and airflow - if you release a new documentation version it will be available +with direct URL (say https://apache.airflow.org/docs/apache-airflow/3.0.1/) but the main site will still +point to previous version of the documentation as `stable` and the version drop-downs will not be updated. + +In order to do it, you need to run the [Build docs](https://github.com/apache/airflow-site/actions/workflows/build.yml) +workflow in `airflow-site` repository. This will build the website and publish it to the `publish` +branch of `airflow-site` repository, including refreshing of the version numbers in the drop-downs and +stable links. + + + + +Some time after the workflow succeeds and documentation is published, in live bucket, the `airflow-site-archive` +repository is automatically synchronized with the live S3 bucket. TODO: IMPLEMENT THIS, FOR NOW IT HAS +TO BE MANUALLY SYNCHRONIZED VIA [Sync s3 to GitHub](https://github.com/apache/airflow-site-archive/actions/workflows/s3-to-github.yml) +workflow in `airflow-site-archive` repository. The `airflow-site-archive` essentially keeps the history of +snapshots of the `live` documentation. + +## Publishing changes to the website (including theme) + +The workflows in `apache-airflow` only update the documentation for the packages (Airflow, Helm chart, +Providers, Docker Stack) that we publish from airflow sources. If we want to publish changes to the website +itself or to the theme (css, javascript) we need to do it in `airflow-site` repository. + +Publishing of airflow-site happens automatically when a PR from `airflow-site` is merged to `main` or when +the [Build docs](https://github.com/apache/airflow-site/actions/workflows/build.yml) workflow is triggered +manually in the main branch of `airflow-site` repository. The workflow builds the website and publishes it to +`publish` branch of `airflow-site` repository, which in turn gets picked up by the ASF servers and is +published as the official website. This includes any changes to `.htaccess` of the website. + +Such a main build also publishes latest "sphinx-airflow-theme" package to GitHub so that the next build +of documentation can automatically pick it up from there. This means that if you want to make changes to +`javascript` or `css` that are part of the theme, you need to do it in `ariflow-site` repository and +merge it to `main` branch in order to be able to run the documentation build in `apache-airflow` repository +and pick up the latest version of the theme. + +The version of sphinx theme is fixed in both repositories: + +* https://github.com/apache/airflow-site/blob/main/sphinx_airflow_theme/sphinx_airflow_theme/__init__.py#L21 +* https://github.com/apache/airflow/blob/main/devel-common/pyproject.toml#L77 in "docs" section + +In case of bigger changes to the theme, we +can first iterate on the website and merge a new theme version, and only after that we can switch to the new +version of the theme. + + +# Fixing historical documentation + +Sometimes we need to update historical documentation (modify generated `html`) - for example when we find +bad links or when we change some of the structure in the documentation. This can be done via the +`airflow-site-archive` repository. The workflow is as follows: + +1. Get the latest version of the documentation from S3 to `airflow-site-archive` repository using + `Sync s3 to GitHub` workflow. This will download the latest version of the documentation from S3 to + `airflow-site-archive` repository (this should be normally not needed, if automated synchronization works). +2. Make the changes to the documentation in `airflow-site-archive` repository. This can be done using any + text editors, scripts etc. Those files are generated as `html` files and are not meant to be regenerated, + they should be modified as `html` files in-place +3. Commit the changes to `airflow-site-archive` repository and push them to `some` branch of the repository. +4. Run `Sync GitHub to S3` workflow in `airflow-site-archive` repository. This will upload the modified + documentation to S3 bucket. +5. You can choose, whether to sync the changes to `live` or `staging` bucket. The default is `live`. +6. By default the workflow will synchronize all documentation modified in single - last commit pushed to + the branch you specified. You can also specify "full_sync" to synchronize all files in the repository. +7. In case you specify "full_sync", you can also synchronize `all` docs or only selected documentation + packages (for example `apache-airflow` or `docker-stack` or `amazon` or `helm-chart`) - you can specify + more than one package separated by spaces. +8. After you synchronize the changes to S3, the Sync `S3 to GitHub` workflow will be triggered + automatically and the changes will be synchronized to `airflow-site-archive` `main` branch - so there + is no need to merge your changes to `main` branch of `airflow-site-archive` repository. You can safely + delete the branch you created in step 3. + + + + +## Manually publishing documentation directly to S3 Review Comment: Sure. We can add "Ask in #internal-airflow-ci-cd channel on slack" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org