potiuk commented on code in PR #50464:
URL: https://github.com/apache/airflow/pull/50464#discussion_r2084979151


##########
docs/README.md:
##########
@@ -37,3 +54,196 @@ Documentation for general overview and summaries not 
connected with any specific
 
 * `docker-stack-docs` - documentation for Docker Stack'
 * `providers-summary-docs` - documentation for provider summary page
+
+# Architecture of documentation for Airflow
+
+Building documentation for Airflow is optimized for speed and for convenience 
workflows of the release
+managers and committers who publish and fix the documentation - that's why 
it's a little complex, as we have
+multiple repositories and multiple sources of the documentation involved.
+
+There are few repositories under `apache` organization which are used to build 
the documentation for Airflow:
+
+* `apache-airflow` - the repository with the code and the documentation 
sources for Airflow distributions,
+   provider distributions, providers summary and docker summary: 
[apache-airflow](https://github.com/apache/airflow)
+   from here we publish the documentation to S3 bucket where the documentation 
is hosted.
+* `airflow-site` - the repository with the website theme and content where we 
keep sources of the website
+   structure, navigation, theme for the website 
[airflow-site](https://github.com/apache/airflow). From here
+   we publish the website to the ASF servers so they are publish as the 
[official website](https://airflow.apache.org)
+* `airflow-site-archive` - here we keep the archived historical versions of 
the generated documentation
+   of all the documentation packages that we keep on S3. This repository is 
automatically synchronized from
+   the S3 buckets and is only used in case we need to perform a bulk update of 
historical documentation. Here only
+   generated `html`, `css`, `js` and `images` files are kept, no sources of 
the documentation are kept here.
+
+We have two S3 buckets where we can publish the documentation generated from 
`apache-airflow` repository:
+
+* `s3://live-docs-airflow-apache-org/docs/` - live, [official 
documentation](https://airflow.apache.org/docs/)
+* `s3://staging-docs-airflow-apache-org/docs/` - staging documentation 
[official documentation](https://staging-airflow.apache.org/docs/) TODO: make 
it work
+
+# Diagrams of the documentation architecture
+
+This is the diagram of live documentation architecture:
+
+![Live documentation architecture](images/documentation_architecture.png)
+
+Staging documentation architecture is similar, but uses staging bucket and 
staging Apache Website.
+
+# Typical workflows
+
+There are a few typical workflows that we support:
+
+## Publishing the documentation by the release manager
+
+The release manager publishes the documentation using GitHub Actions workflow
+[Publish Docs to 
S3](https://github.com/apache/airflow/actions/workflows/publish-docs-to-s3.yml).
+The same workflow can be used to publish Airflow, Helm chart and providers 
documentation.
+
+The person who triggers the build (release manager) should specify the tag 
name of the docs to be published
+and the list of documentation packages to be published. Usually it is:
+
+* Airflow: `apache-airflow docker-stack` (later we will add `airflow-ctl` and 
`task-sdk`)
+* Helm chart: `helm-chart`
+* Providers: `provider_id1 provider_id2` or `all providers` if all providers 
should be published.
+
+Optionally - specifically if we run `all-providers` and release manager wants 
to exclude some providers,
+they can specify documentation packages to exclude. Leaving "no-docs-excluded" 
will publish all packages
+specified to be published without exclusions.
+
+You can also specify whether "live" or "staging" documentation should be 
published. The default is "live".
+
+Example screenshot of the workflow triggered from the GitHub UI:
+
+![Publishing airflow or providers](images/publish_airflow.png)
+
+Note that this just publishes the documentation but does not update the "site" 
with version numbers or
+stable links to providers and airflow - if you release a new documentation 
version it will be available
+with direct URL (say https://apache.airflow.org/docs/apache-airflow/3.0.1/) 
but the main site will still
+point to previous version of the documentation as `stable` and the version 
drop-downs will not be updated.
+
+In order to do it, you need to run the [Build 
docs](https://github.com/apache/airflow-site/actions/workflows/build.yml)
+workflow in `airflow-site` repository. This will build the website and publish 
it to the `publish`
+branch of `airflow-site` repository, including refreshing of the version 
numbers in the drop-downs and
+stable links.
+
+![Publishing site](images/publish_site.png)
+
+
+Some time after the workflow succeeds and documentation is published, in live 
bucket, the `airflow-site-archive`
+repository is automatically synchronized with the live S3 bucket. TODO: 
IMPLEMENT THIS, FOR NOW IT HAS
+TO BE MANUALLY SYNCHRONIZED VIA [Sync s3 to 
GitHub](https://github.com/apache/airflow-site-archive/actions/workflows/s3-to-github.yml)
+workflow in `airflow-site-archive` repository. The `airflow-site-archive` 
essentially keeps the history of
+snapshots of the `live` documentation.
+
+## Publishing changes to the website (including theme)
+
+The workflows in `apache-airflow` only update the documentation for the 
packages (Airflow, Helm chart,
+Providers, Docker Stack) that we publish from airflow sources. If we want to 
publish changes to the website
+itself or to the theme (css, javascript) we need to do it in `airflow-site` 
repository.
+
+Publishing of airflow-site happens automatically when a PR from `airflow-site` 
is merged to `main` or when
+the [Build 
docs](https://github.com/apache/airflow-site/actions/workflows/build.yml) 
workflow is triggered
+manually in the main branch of `airflow-site` repository. The workflow builds 
the website and publishes it to
+`publish` branch of `airflow-site` repository, which in turn gets picked up by 
the ASF servers and is
+published as the official website. This includes any changes to `.htaccess` of 
the website.
+
+Such a main build also publishes latest "sphinx-airflow-theme" package to 
GitHub so that the next build
+of documentation can automatically pick it up from there. This means that if 
you want to make changes to
+`javascript` or `css` that are part of the theme, you need to do it in 
`ariflow-site` repository and
+merge it to `main` branch in order to be able to run the documentation build 
in `apache-airflow` repository
+and pick up the latest version of the theme.
+
+The version of sphinx theme is fixed in both repositories:
+
+* 
https://github.com/apache/airflow-site/blob/main/sphinx_airflow_theme/sphinx_airflow_theme/__init__.py#L21
+* https://github.com/apache/airflow/blob/main/devel-common/pyproject.toml#L77 
in "docs" section
+
+In case of bigger changes to the theme, we
+can first iterate on the website and merge a new theme version, and only after 
that we can switch to the new
+version of the theme.
+
+
+# Fixing historical documentation
+
+Sometimes we need to update historical documentation (modify generated `html`) 
- for example when we find
+bad links or when we change some of the structure in the documentation. This 
can be done via the
+`airflow-site-archive` repository. The workflow is as follows:
+
+1. Get the latest version of the documentation from S3 to 
`airflow-site-archive` repository using
+   `Sync s3 to GitHub` workflow. This will download the latest version of the 
documentation from S3 to
+   `airflow-site-archive` repository (this should be normally not needed, if 
automated synchronization works).
+2. Make the changes to the documentation in `airflow-site-archive` repository. 
This can be done using any
+   text editors, scripts etc. Those files are generated as `html` files and 
are not meant to be regenerated,
+   they should be modified as `html` files in-place
+3. Commit the changes to `airflow-site-archive` repository and push them to 
`some` branch of the repository.
+4. Run `Sync GitHub to S3` workflow in `airflow-site-archive` repository. This 
will upload the modified
+   documentation to S3 bucket.
+5. You can choose, whether to sync the changes to `live` or `staging` bucket. 
The default is `live`.
+6. By default the workflow will synchronize all documentation modified in 
single - last commit pushed to
+   the branch you specified. You can also specify "full_sync" to synchronize 
all files in the repository.
+7. In case you specify "full_sync", you can also synchronize `all` docs or 
only selected documentation
+   packages (for example `apache-airflow` or `docker-stack` or `amazon` or 
`helm-chart`) - you can specify
+   more than one package separated by  spaces.
+8. After you synchronize the changes to S3, the Sync `S3 to GitHub` workflow 
will be triggered
+   automatically and the changes will be synchronized to 
`airflow-site-archive` `main` branch - so there
+   is no need to merge your changes to `main` branch of `airflow-site-archive` 
repository. You can safely
+   delete the branch you created in step 3.
+
+![Sync GitHub to S3](images/sync_github_to_s3.png)
+
+
+## Manually publishing documentation directly to S3

Review Comment:
   Sure. We can add "Ask in #internal-airflow-ci-cd channel on slack"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to