kou opened a new pull request, #43553:
URL: https://github.com/apache/arrow/pull/43553
### Rationale for this change
Statistics are useful for fast query processing. Many query engines
use statistics to optimize their query plan.
Apache Arrow format doesn't have statistics but other formats that can
be read as Apache Arrow data may have statistics. For example, Apache
Parquet C++ can read Apache Parquet file as Apache Arrow data and
Apache Parquet file may have statistics.
One of the Arrow C data interface use cases is the following:
1. Module A reads Apache Parquet file as Apache Arrow data
2. Module A passes the read Apache Arrow data to module B through the
Arrow C data interface
3. Module B processes the passed Apache Arrow data
If module A can pass the statistics associated with the Apache Parquet
file to module B through the Arrow C data interface, module B can use
the statistics to optimize its query plan.
### What changes are included in this PR?
Add the specification to pass statistics through the Arrow C data interface
based on the discussion on the `dev@` mailing list:
https://lists.apache.org/thread/z0jz2bnv61j7c6lbk7lympdrs49f69cx
### Are these changes tested?
Yes.
### Are there any user-facing changes?
Yes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]