Hi Ian,

Yes, more documentation regarding ORC would be very welcome! I think
your list of missing docs is correct:

- It's briefly mentioned in the Python API docs
(https://arrow.apache.org/docs/python/api/formats.html#orc-files), but
incomplete
- The C++ reference docs list the OrcFileFormat for the dataset API
(https://arrow.apache.org/docs/cpp/api/dataset.html#_CPPv4N5arrow7dataset13OrcFileFormatE),
but not the direct ORC interface (like is done for Parquet at
https://arrow.apache.org/docs/cpp/api/formats.html, for which the
source lives at
https://github.com/apache/arrow/blob/master/docs/source/cpp/api/formats.rst)
- There is indeed no user guide. The Parquet python doc page lives at
https://github.com/apache/arrow/blob/master/docs/source/python/parquet.rst

Best,
Joris

On Wed, 24 Nov 2021 at 04:55, Ian Joiner <[email protected]> wrote:
>
> Hi,
>
> Today I found that pretty much none of our ORC-related work (e.g. ORC
> writer in C++ & Python, Arrow Dataset with ORC) has ever been documented.
> This is something we have to fix or users won’t even be aware that ORC
> support exists, let alone how to use it.
>
> From my understanding it seems that we miss the following docs:
> 1. C++ and Python API reference (partially missing)
> 2. User Guide (entirely absent)
>
> As the person who created and self-assigned
> https://issues.apache.org/jira/browse/ARROW-13231 I’d like to spend the
> next a couple of days fixing it. Could you guys please point me towards
> what actually needs to be revised? In particular where is the source of
> https://arrow.apache.org/docs/python/parquet.html ?
> Really thanks!
>
> Ian

Reply via email to