(Note: I also filed this as a JIRA [1] a few days ago, but I noticed that
the mailing list seems to be a better place for opening discussions.)

I've been enjoying diving into Beam recently, but to my frustration I've
found that I often need to look through the source code to discover APIs.

Beam has some really nice documentation on its website (I particularly love
the "transform catalog") but I find the "API docs" [1] to be nearly
unusable, at least for the Python SDK. For example, try clicking on any of
the sub-headings, e.g., apache_beam.io [3]. It's a long, heavily nested
listing of the raw internal structure of Beam's python modules.

To enumerate my concerns:
1. It's hard to navigate. I need to know exactly where a function is
defined to find it. E.g., to find beam.Map, I had to click on
"apache_beam.transforms package" followed by "apache_beam.transforms.core
module" and then scroll down or search in the page for "Map."
2. It isn't clear exactly which components are public APIs. The
documentation for a few modules notes that they are not public, but there
are so many others listed that I'm sure they cannot all be intended for
public support. This makes it hard to find Beam's main public APIs.
3. It isn't clear the preferred import paths to use. For example,
apache_beam.Map is documented as apache_beam.transforms.core.Map, without
mention of the shorter name.

I suspect the source of most of these issues is that the API docs make
heavy use of Sphinx's autodoc for modules. In my experience maintaining
Python projects, this just doesn't work very well. autosummary and
autofunction on individual functions/classes work well, but it needs to be
organized by hand – you can't count on automodule to do a good job of high
level organization. JAX's docs are a good example, e.g., see the source
code [4] and rendered HTML [5].

This would definitely be a bit of work, but is relatively straightforward
to set-up and I think would pay big dividends for discoverability of Beam's
API. I've gone through this process a few times for different projects, so
I would be happy to advise if/as issues come up.

Cheers,
Stephan

[1] https://issues.apache.org/jira/browse/BEAM-12235
[2] https://beam.apache.org/releases/pydoc/2.28.0/index.html
[3] https://beam.apache.org/releases/pydoc/2.28.0/apache_beam.io.html
[4] https://github.com/google/jax/blob/master/docs/jax.rst
[5] https://jax.readthedocs.io/en/latest/jax.html

Reply via email to