[ 
https://issues.apache.org/jira/browse/SPARK-54179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18058036#comment-18058036
 ] 

Cheng Pan commented on SPARK-54179:
-----------------------------------

[~cboumalh] The problem is, the datasketches-java 6.2 used by Spark, does not 
support Java 25. I'm working with the DataSketches community to remove such 
restrictions to unblock Spark's Java 25 support. Once Spark drops Java 17 and 
moves the baseline to Java 21 or 25, then we can upgrade datasketches-java to 
new versions

> Add Native Support for Apache Tuple Sketches
> --------------------------------------------
>
>                 Key: SPARK-54179
>                 URL: https://issues.apache.org/jira/browse/SPARK-54179
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Christopher Boumalhab
>            Assignee: Christopher Boumalhab
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.2.0
>
>
> Implement support for tuple sketches in Apache Spark to enable approximate 
> set cardinality, frequency, and similarity computations over multiple 
> dimensions efficiently. The feature should:
>  * Integrate tuple sketches with Spark’s DataFrame and RDD APIs.
>  * Provide functions for creating, updating, and querying tuple sketches.
>  * Support common sketch operations such as union, intersection, and 
> cardinality estimation.
>  * Ensure compatibility with Spark SQL and allow usage within DataFrame 
> transformations and aggregations.
>  * Include unit and integration tests validating accuracy and performance.
>  * Provide documentation and examples for developers.
> *Acceptance Criteria:*
> 1. Sketches support aggregation and merging operations.
> 2. Queries return approximate cardinalities or other statistics with expected 
> error bounds.
> 3. Performance benchmarks show scalability for large datasets.
> 4. Documentation includes API usage examples.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to