Christopher Boumalhab created SPARK-54179:
---------------------------------------------
Summary: Add Native Support for Apache Tuple Sketches
Key: SPARK-54179
URL: https://issues.apache.org/jira/browse/SPARK-54179
Project: Spark
Issue Type: New Feature
Components: SQL
Affects Versions: 4.2.0
Reporter: Christopher Boumalhab
Implement support for tuple sketches in Apache Spark to enable approximate set
cardinality, frequency, and similarity computations over multiple dimensions
efficiently. The feature should:
* Integrate tuple sketches with Spark’s DataFrame and RDD APIs.
* Provide functions for creating, updating, and querying tuple sketches.
* Support common sketch operations such as union, intersection, and
cardinality estimation.
* Ensure compatibility with Spark SQL and allow usage within DataFrame
transformations and aggregations.
* Include unit and integration tests validating accuracy and performance.
* Provide documentation and examples for developers.
*Acceptance Criteria:*
# Sketches support aggregation and merging operations.
# Queries return approximate cardinalities or other statistics with expected
error bounds.
# Performance benchmarks show scalability for large datasets.
# Documentation includes API usage examples.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]