Hi all, With respect to what examples/information may be relevant to add/improve in documentation, I find that browsing GitHub issues [1] is a good place to identify some cases on how users are using Arrow. Moreover, many of the GH issues related to code examples, contain snippets of code in the responses demonstrating a possible approach which can be considered for documentation examples and/or Arrow Cookbook [2].
[1] https://github.com/apache/arrow/issues?q=is%3Aissue [2] https://arrow.apache.org/cookbook On Wed, Jan 5, 2022 at 1:21 PM Rok Mihevc <rok.mih...@gmail.com> wrote: > Attendees > > Nic Crane, Micah Kornfeld, Eduardo Ponce, Will Jones, Rok Mihevc, > David Li, Niranda Perera, Benson Muite > > > Agenda > > - Discussion about the new columnar memory layout > - Preparing for 7.0.0 release - 2nd or 3rd week of January > - Documentation improvement > - Support for table like structures (Apache Iceberg, Delta Lake) > > > Minutes > > - Not enough stakeholders on the call to discuss the new layout > proposal [1]. Micah might chime in on ML. > > - 7.0.0 release is scheduled for the 2nd or 3rd week of January. > Please plan to complete PRs for 7.0.0 in time or bump Fix Version from > 7.0.0 to 8.0.0 in your Jira issues not expected to be resolved in time > [2]. See [3] to track the progress of the release. > > - Eduardo proposed discussion of documentation improvement. Main > pinpoint being sparse documentation of C++ compute kernels available > to users: Cookbook is not very extensive yet, few public examples of > usage. Just browsing public API shows many undocumented > functionalities. Functions are documented in code with docstrings, but > these are not used for documentation (?). There is a table of kernels > [4] but it could be more verbose. Could we use docstrings? > Jon says R wrapper can pull C++ docstrings for it’s documentation but > the mapping of functionality is not always 1-on-1. > Eduardo: Another pain point is internal abstractions are not well > documented which stalls new committers. Eduardo will open a PR for > this. There are already two PRs in review to improve kernel docs: [5], > [6]. > > - Support for table like structures discussion - Micah is interested > if there is any progress in this area. Will looked into this and > opened two open Jiras for Delta Lake [7] and Iceberg [8]. Technically > there are no issues implementing readers for either option, but there > are some worries about governance/maintenance/licensing. We don’t have > a reader for Avro hence Wil first looked into Delta Lake via the Rust > reader. > > > [1] https://lists.apache.org/thread/49qzofswg1r5z7zh39pjvd1m2ggz2kdq > [2] https://lists.apache.org/thread/ng11x17yhvdfo8b3wgmd1qn40hy50g13 > [3] https://cwiki.apache.org/confluence/display/ARROW/Arrow+7.0.0+Release > [4] https://arrow.apache.org/docs/cpp/compute.html > [5] https://github.com/apache/arrow/pull/10296 - ARROW-12724: [C++] > Add documentation for authoring compute kernels > [6] https://github.com/apache/arrow/pull/12076 - ARROW-10317: [Python] > Document compute function options > [7] https://issues.apache.org/jira/browse/ARROW-14730 > [8] https://issues.apache.org/jira/browse/ARROW-15135 >