Christopher Boumalhab created SPARK-54179:
---------------------------------------------

             Summary: Add Native Support for Apache Tuple Sketches
                 Key: SPARK-54179
                 URL: https://issues.apache.org/jira/browse/SPARK-54179
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 4.2.0
            Reporter: Christopher Boumalhab


Implement support for tuple sketches in Apache Spark to enable approximate set 
cardinality, frequency, and similarity computations over multiple dimensions 
efficiently. The feature should:
 * Integrate tuple sketches with Spark’s DataFrame and RDD APIs.

 * Provide functions for creating, updating, and querying tuple sketches.

 * Support common sketch operations such as union, intersection, and 
cardinality estimation.

 * Ensure compatibility with Spark SQL and allow usage within DataFrame 
transformations and aggregations.

 * Include unit and integration tests validating accuracy and performance.

 * Provide documentation and examples for developers.

*Acceptance Criteria:*
 # Sketches support aggregation and merging operations.

 # Queries return approximate cardinalities or other statistics with expected 
error bounds.

 # Performance benchmarks show scalability for large datasets.

 # Documentation includes API usage examples.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to