[ 
https://issues.apache.org/jira/browse/SPARK-54179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-54179:
-----------------------------------
    Labels: pull-request-available  (was: )

> Add Native Support for Apache Tuple Sketches
> --------------------------------------------
>
>                 Key: SPARK-54179
>                 URL: https://issues.apache.org/jira/browse/SPARK-54179
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Christopher Boumalhab
>            Priority: Major
>              Labels: pull-request-available
>
> Implement support for tuple sketches in Apache Spark to enable approximate 
> set cardinality, frequency, and similarity computations over multiple 
> dimensions efficiently. The feature should:
>  * Integrate tuple sketches with Spark’s DataFrame and RDD APIs.
>  * Provide functions for creating, updating, and querying tuple sketches.
>  * Support common sketch operations such as union, intersection, and 
> cardinality estimation.
>  * Ensure compatibility with Spark SQL and allow usage within DataFrame 
> transformations and aggregations.
>  * Include unit and integration tests validating accuracy and performance.
>  * Provide documentation and examples for developers.
> *Acceptance Criteria:*
>  # Sketches support aggregation and merging operations.
>  # Queries return approximate cardinalities or other statistics with expected 
> error bounds.
>  # Performance benchmarks show scalability for large datasets.
>  # Documentation includes API usage examples.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to