[ https://issues.apache.org/jira/browse/SPARK-42691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17698935#comment-17698935 ]
Apache Spark commented on SPARK-42691: -------------------------------------- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40366 > Implement Dataset.semanticHash > ------------------------------ > > Key: SPARK-42691 > URL: https://issues.apache.org/jira/browse/SPARK-42691 > Project: Spark > Issue Type: New Feature > Components: Connect > Affects Versions: 3.4.0 > Reporter: Herman van Hövell > Priority: Major > > Implement Dataset.semanticHash: > {code:java} > /** > * Returns a `hashCode` of the logical query plan against this [[Dataset]]. > * > * @note Unlike the standard `hashCode`, the hash is calculated against the > query plan > * simplified by tolerating the cosmetic differences such as attribute names. > * @since 3.4.0 > */ > @DeveloperApi > def semanticHash(): Int{code} > This has to be computed on the spark connect server to do this. Please extend > the > AnalyzePlanRequest and AnalyzePlanResponse messages for this. > Also make sure this works in PySpark. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org