Hi Ian, Yes, more documentation regarding ORC would be very welcome! I think your list of missing docs is correct:
- It's briefly mentioned in the Python API docs (https://arrow.apache.org/docs/python/api/formats.html#orc-files), but incomplete - The C++ reference docs list the OrcFileFormat for the dataset API (https://arrow.apache.org/docs/cpp/api/dataset.html#_CPPv4N5arrow7dataset13OrcFileFormatE), but not the direct ORC interface (like is done for Parquet at https://arrow.apache.org/docs/cpp/api/formats.html, for which the source lives at https://github.com/apache/arrow/blob/master/docs/source/cpp/api/formats.rst) - There is indeed no user guide. The Parquet python doc page lives at https://github.com/apache/arrow/blob/master/docs/source/python/parquet.rst Best, Joris On Wed, 24 Nov 2021 at 04:55, Ian Joiner <[email protected]> wrote: > > Hi, > > Today I found that pretty much none of our ORC-related work (e.g. ORC > writer in C++ & Python, Arrow Dataset with ORC) has ever been documented. > This is something we have to fix or users won’t even be aware that ORC > support exists, let alone how to use it. > > From my understanding it seems that we miss the following docs: > 1. C++ and Python API reference (partially missing) > 2. User Guide (entirely absent) > > As the person who created and self-assigned > https://issues.apache.org/jira/browse/ARROW-13231 I’d like to spend the > next a couple of days fixing it. Could you guys please point me towards > what actually needs to be revised? In particular where is the source of > https://arrow.apache.org/docs/python/parquet.html ? > Really thanks! > > Ian
