Hi Kevin,
There are a couple of concerns to keep in mind:
- we don't want to increase the import time of PyArrow too much
- we would like to limit the required runtime dependencies for PyArrow
(an issue is open to move docstring generation at package build time:
https://issues.apache.org/jira/browse/ARROW-12526)
As for your proposal, it sounds like an interesting idea but the devil
may lie in the details, so it would be good to see an actual implementation.
Regards
Antoine.
Le 26/04/2022 à 04:08, Kevin Crouse a écrit :
Hi everyone,
Sorry if some of this is out of place or not in the right dev email
structure. I've only recently started getting into the arrow dev stuff.
*Summary*: I'm interested in improving the API and functional
documentation, especially for the pyarrow compute functions as I've been
doing some deep implementation and finding issues with some doc examples
being wrong and most functions not having examples. I think this is mostly
because the function docs are inherited from the cpp tree. That makes sense
to keep things in sync with the C++ library, which is the ultimate source.
*Why*: There is almost no way to customize the documentation to reflect the
pythonic features and abilities. There is a very small ability to provide
additional information by writing a docstring appendix in
python/pyarrow/_compute_docstrings.py, but nothing to modify or add to the
description or change the details on the parameters. I feel this is not
ideal because:
1. it only adds supplemental details,
2. there's no easy way to test example code as you are writing it,
3. Trying to figure out where any given documentation comes from (in
order to improve it) really requires you to trace your way through a lot of
modules
4. it feels unnecessarily complex. In order to add an example, we write
reSt-style docstring parts into a python module just so it get be
reconstituted into a regular functional docstring in another module
(python/pyarrow/compute.py), and then that is used to build actual reSt
docs when docs are built
*Proposal?*:
How about having a subdirectory for doc additions written in reStructured
text that looks a lot like regular functional docs. This provides a single,
easy to find location for the custom python docs (solving #3 and some of
#4) and examples can be tested with doctest (solving #2). Then, write a
function to parse the reSt file and use the details there to merge with the
function docs pulled in from the cpp library function docs in
python/pyarrow/compute.py - so this flexibly lets us add examples, notes,
or extra python-specific additions easily (solving #1). AND, in cases when
a parameter is defined in the reSt addition file, it will supplant the text
pulled from the cpp tree - but if there's no need to provide extra details,
not including a Parameters section just defaults to the current cpp docs.
I realize that may all be hard to follow, especially if you haven't been
deep in the python docs. I quickly threw together a prototype if this
sounds like a useful path forward.
Best,
Kevin