potiuk commented on code in PR #50464:
URL: https://github.com/apache/airflow/pull/50464#discussion_r2083628581


##########
docs/README.md:
##########
@@ -37,3 +53,162 @@ Documentation for general overview and summaries not 
connected with any specific
 
 * `docker-stack-docs` - documentation for Docker Stack'
 * `providers-summary-docs` - documentation for provider summary page
+
+# Architecture of documentation for Airflow
+
+Building documentation for Airflow is optimized for speed and for convenience 
workflows of the release
+managers and committers who publish and fix the documentation - that's why 
it's a little complex, as we have
+multiple repositories and multiple sources of the documentation involved.
+
+There are few repositories under `apache` organization which are used to build 
the documentation for Airflow:
+
+* `apache-airflow` - the repository with the code and the documentation 
sources for Airflow distributions,
+   provider distributions, providers summary and docker summary: 
[apache-airflow](https://github.com/apache/airflow)
+   from here we publish the documentation to S3 bucket where the documentation 
is hosted.
+* `airflow-site` - the repository with the website theme and content where we 
keep sources of the website
+   structure, navigation, theme for the website 
[airflow-site](https://github.com/apache/airflow). From here
+   we publish the website to the ASF servers so they are publish as the 
[official website](https://airflow.apache.org)
+* `airflow-site-archive` - here we keep the archived historical versions of 
the generated documentation
+   of all the documentation packages that we keep on S3. This repository is 
automatically synchronized from
+   the S3 buckets and is only used in case we need to perform a bulk update of 
historical documentation. Here only
+   generated `html`, `css`, `js` and `images` files are kept, no sources of 
the documentation are kept here.
+
+We have two S3 buckets where we can publish the documentation generated from 
`apache-airflow` repository:
+
+* `s3://live-docs-airflow-apache-org/docs/` - live, [official 
documentation](https://airflow.apache.org/docs/)
+* `s3://staging-docs-airflow-apache-org/docs/` - staging documentation 
[official documentation](https://staging-airflow.apache.org/docs/) TODO: make 
it works
+
+# Diagrams of the documentation architecture
+
+This is the diagram od live documentation architecture:
+
+![Live documentation architecture](images/documentation_architecture.png)
+
+Staging documentation architecture is similar, but uses staging bucket and 
staging Apache Website.
+
+# Typical workflows
+
+There are a few typical workflows that we support:
+
+## Publishing the documentation by the release manager
+
+The release manager publishes the documentation using `Publish Docs to S3` 
GitHub Action (accessible
+via [GitHub 
UI](https://github.com/apache/airflow/actions/workflows/publish-docs-to-s3.yml).
 The same  workflow can be used to publish Airflow, Helm chart and providers 
documentation.
+
+The person who triggers the build (release manager) should specify the tag 
name of the docs to be published
+and the list of documentation packages to be published. Usually it is:
+
+* Airflow: `apache-airflow docker-stack` (later we will add `airflow-ctl` and 
`task-sdk`)
+* Helm chart: `helm-chart`
+* Providers: `provider_id1 provider_id2` or `all providers` if all providers 
should be published.
+
+Optionally - specifically if we run `all-providers` and release manager wants 
to exclude some providers,
+they can specify documentation packages to exclude. Leaving "no-docs-excluded" 
will publish all packages
+specified to be published without exclusions.
+
+You can also specify whether "live" or "staging" documentation should be 
published. The default is "live".
+
+Example screenshot of the workflow triggered from the GitHub UI:
+
+![Publishing airflow](images/publish_airflow.png)
+
+Right after the workflow succeeds and documentation is published, in live 
bucket, the `airflow-site-archive`
+repository is automatically synchronized with the live S3 bucket. TO DO: 
IMPLEMENT THIS, FOR NOW IT HAS
+TO BE MANUALLY SYNCHRONIZED VIA [Sync s3 to 
GitHub](https://github.com/apache/airflow-site-archive/actions/workflows/s3-to-github.yml)
+workflow in `airflow-site-archive` repository. The `airflow-site-archive` 
essentially keeps the history of
+snapshots of the `live` documentation.
+
+Also another thing that is triggered automatically (TODO: IMPLEMENT THIS) is 
building the `airflow-site`
+and re-publishing it. This is needed in order to refresh version numbers of 
published documentation. After
+you publish new version of the documentation, the "stable" version of the 
documentation is automatically
+replaced with the new version, however building the `airflow-site` is needed 
to make sure that the drop-down
+version numbers are updated properly.
+
+
+## Publishing changes to the website (including theme)
+
+The workflows in `apache-airflow` only update the documentation for the 
documentation packages (Airflow, Helm chart,
+Providers, Docker Stack) that we publish from airflow sources. If we want to 
publish changes to the website
+itself or to the theme (css, javascript) we need to do it in `airflow-site` 
repository.
+
+Publishing of airflow-site happens automatically when a PR from `airflow-site` 
is merged to `main` or when
+the [Build 
docs](https://github.com/apache/airflow-site/actions/workflows/build.yml) 
workflow is triggered
+manually in the main branch of `airflow-site` repository. The workflow builds 
the website and publishes it to
+`publish` branch of `airflow-site` repository, which in turn gets picked up by 
the ASF servers and is
+published as the official website. This includes any changes to `.htaccess` of 
the website.
+
+Such a main build also publishes latest "sphinx-airflow-theme" package to 
GitHub so that the next build
+of documentation can automatically pick it up from there. This means that if 
you want to make changes to
+`javascript` or `css` that are part of the theme, you need to do it in 
`ariflow-site` repository and
+merge it to `main` branch in order to be able to run the documentation build 
in `apache-airflow` repository
+and pick up the latest version of the theme.
+
+The version of sphinx theme is fixed in both repositories:
+
+* 
https://github.com/apache/airflow-site/blob/main/sphinx_airflow_theme/sphinx_airflow_theme/__init__.py#L21
+* https://github.com/apache/airflow/blob/main/devel-common/pyproject.toml#L77 
in "docs" section
+
+In case of bigger changes to the theme, we
+can first iterate on the website and merge a new theme version, and only after 
that we can switch to the new
+version of the theme.
+
+
+# Fixing historical documentation
+
+Sometimes we need to update historical documentation (modify generated `html`) 
- for example when we find
+bad links or when we change some of the structure in the documentation. This 
can be done via the
+`airflow-site-archive` repository. The workflow is as follows:
+
+* Get the latest version of the documentation from S3 to 
`airflow-site-archive` repository using
+  `Sync s3 to GitHub` workflow. This will download the latest version of the 
documentation from S3 to
+  `airflow-site-archive` repository (this should be normally not needed, if 
automated synchronization works).
+
+* Make the changes to the documentation in `airflow-site-archive` repository. 
This can be done using any
+  text editors, scripts etc. Those files are generated as `html` files and are 
not meant to be regenerated,
+  they should be modified as `html` files in-place
+
+* Commit the changes to `airflow-site-archive` repository and push them to 
`some` branch of the repository.
+
+* Run `Sync GitHub to S3` workflow in `airflow-site-archive` repository. This 
will upload the modified
+  documentation to S3 bucket.
+
+* You can choose, whether to sync the changes to `live` or `staging` bucket. 
The default is `live`.
+
+* By default the workflow will synchronize all documentation modified in 
single - last commit pushed to
+  the branch you specified. You can also specify "full_sync" to synchronize 
all files in the repository.
+
+* In case you specify "full_sync", you can also synchronize `all` docs or only 
selected documentation
+  packages (for example `apache-airflow` or `docker-stack` or `amazon` or 
`helm-chart`) - you can specify
+ more than one package separated by  spaces.
+
+* After you synchronize the changes to S3, the Sync `S3 to GitHub` workflow 
will be triggered
+  automatically and the changes will be synchronized to `airflow-site-archive` 
`main` branch - so there
+  is no need to merge your changes to `main` branch of `airflow-site-archive` 
repository.
+
+![Sync GitHub to S3](images/sync_github_to_s3.png)
+
+
+## Manually publishing documentation
+
+The regular publishing workflows involve running Github Actions workflow and 
they cover majority of cases,
+however sometimes some manual updates and cherry-picks are needed, when we 
discover problems with the
+publishing and doc building code - for example when we find that we need to 
fix extensions to sphinx.
+
+In such case, release manager or a committer can build and publish 
documentation locally - providing that
+they configure AWS credentials to be able to upload files to S3.
+
+You can checkout locally a version of airflow repo that you need and apply any 
cherry-picks you need before
+running publishing.
+
+This is done using breeze. You also need to have aws CLI installed and 
configured credentials to be able
+to upload files to S3. You can get credentials from one of the admins of 
Airflow's AWS account. The
+region to set for AWS is `us-east-2`.
+

Review Comment:
   Added the second manual update option and mentioned `--dry-run`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to