timsaucer opened a new pull request, #1579:
URL: https://github.com/apache/datafusion-python/pull/1579
# Which issue does this PR close?
Closes #.
# Rationale for this change
Phase 2 of the documentation-site refresh started in #1578. With the
modern pydata-sphinx-theme + navigation in place, this PR moves the
content format off `.rst` and onto MyST `.md`. The motivation:
- Markdown is the lingua franca of agent-tuned tooling. LLMs trained
on GitHub and modern docs parse Markdown reliably; reStructuredText
is a minority dialect that frequently confuses both humans editing
via PR review and agents reading the source. The Apache
`datafusion-comet` sibling project completed the same migration
recently and reported smoother contributor onboarding.
- MyST is a strict superset of CommonMark with directives for the
Sphinx features we actually use (toctrees, cross-references,
code-blocks, admonitions, eval-rst escape hatch).
- The `myst-parser` extension is already in the docs dependency
group and was loaded by `conf.py` even before this PR — switching
the on-disk format is a low-risk, mechanical change.
This PR stacks on #1578 (theme + navbar refresh). It should land
after #1578.
# What changes are included in this PR?
Format conversion (mechanical, via `rst-to-myst`):
- 33 human-authored `.rst` files under `docs/source/` become 33
`.md` files — the user guide, contributor guide, IO subsection,
common-operations subsection, dataframe subsection, top-level
`index`, and `links`.
- Toctrees, cross-references, code blocks, hyperlinks, admonitions,
and license headers all round-trip cleanly.
Manual fixes layered on top of the converter output:
- **Cross-reference anchors.** The converter kebab-cased every
`(label)=` anchor (e.g. `(io-csv)=`), but every `{ref}` in the
corpus — including the Python docstrings that `sphinx-autoapi`
pulls into the API reference — still uses the underscore form
(`{ref}\`CSV <io_csv>\``). Rewrite the anchors back to underscore
form (`(io_csv)=`, `(window_functions)=`, `(user_guide_concepts)=`,
`(execution_metrics)=`, etc.) so existing references resolve
without churning every callsite.
- **MyST extensions.** Enable `colon_fence` and `deflist` in
`myst_enable_extensions` (the converter emits these on a few
files, notably `dataframe/execution-metrics.md`).
- **`source_suffix`.** Keep `.rst` registered even though no
human-authored RST remains: `sphinx-autoapi` generates `.rst`
under `autoapi/` at build time and Sphinx needs the suffix to
parse it. The comment in `conf.py` flags this so a future cleanup
pass doesn't strip it again.
86 `{eval-rst}` blocks remain in the converted output. Every one of
them wraps a `.. ipython::` directive, which has no first-class MyST
equivalent in our extensions setup. The blocks render identically
and don't block the build. Migrating these to a native MyST exec
syntax is a follow-up that requires either `myst-nb` or a custom
parser registration — out of scope here.
`AGENTS.md` is updated so the two `.rst` paths called out under
"Aggregate and Window Function Documentation" point at the new `.md`
equivalents.
# Are there any user-facing changes?
No behavioral change to the `datafusion` package — only the source
format of the published documentation. Readers of the rendered site
will not notice the migration; the HTML output is unchanged. Internal
cross-references resolve, the `pokemon.csv` ipython example on the
landing page and the `yellow_tripdata_2021-01.parquet` example on
the basics page both still execute.
No `api change` label — public APIs untouched.
## Follow-ups (out of scope for this PR)
- Migrate the 86 `{eval-rst}` `.. ipython::` blocks to a
MyST-native exec syntax. Requires either pulling in `myst-nb` or
configuring a per-language parser.
- Phase 3: multi-version doc publishing (the comet pattern).
- Phase 4: `asf-site` publishing workflow.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]