Attendees Nic Crane, Micah Kornfeld, Eduardo Ponce, Will Jones, Rok Mihevc, David Li, Niranda Perera, Benson Muite
Agenda - Discussion about the new columnar memory layout - Preparing for 7.0.0 release - 2nd or 3rd week of January - Documentation improvement - Support for table like structures (Apache Iceberg, Delta Lake) Minutes - Not enough stakeholders on the call to discuss the new layout proposal [1]. Micah might chime in on ML. - 7.0.0 release is scheduled for the 2nd or 3rd week of January. Please plan to complete PRs for 7.0.0 in time or bump Fix Version from 7.0.0 to 8.0.0 in your Jira issues not expected to be resolved in time [2]. See [3] to track the progress of the release. - Eduardo proposed discussion of documentation improvement. Main pinpoint being sparse documentation of C++ compute kernels available to users: Cookbook is not very extensive yet, few public examples of usage. Just browsing public API shows many undocumented functionalities. Functions are documented in code with docstrings, but these are not used for documentation (?). There is a table of kernels [4] but it could be more verbose. Could we use docstrings? Jon says R wrapper can pull C++ docstrings for it’s documentation but the mapping of functionality is not always 1-on-1. Eduardo: Another pain point is internal abstractions are not well documented which stalls new committers. Eduardo will open a PR for this. There are already two PRs in review to improve kernel docs: [5], [6]. - Support for table like structures discussion - Micah is interested if there is any progress in this area. Will looked into this and opened two open Jiras for Delta Lake [7] and Iceberg [8]. Technically there are no issues implementing readers for either option, but there are some worries about governance/maintenance/licensing. We don’t have a reader for Avro hence Wil first looked into Delta Lake via the Rust reader. [1] https://lists.apache.org/thread/49qzofswg1r5z7zh39pjvd1m2ggz2kdq [2] https://lists.apache.org/thread/ng11x17yhvdfo8b3wgmd1qn40hy50g13 [3] https://cwiki.apache.org/confluence/display/ARROW/Arrow+7.0.0+Release [4] https://arrow.apache.org/docs/cpp/compute.html [5] https://github.com/apache/arrow/pull/10296 - ARROW-12724: [C++] Add documentation for authoring compute kernels [6] https://github.com/apache/arrow/pull/12076 - ARROW-10317: [Python] Document compute function options [7] https://issues.apache.org/jira/browse/ARROW-14730 [8] https://issues.apache.org/jira/browse/ARROW-15135