[jira] [Commented] (SPARK-42691) Implement Dataset.semanticHash
[ https://issues.apache.org/jira/browse/SPARK-42691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698935#comment-17698935 ] Apache Spark commented on SPARK-42691: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40366 > Implement Dataset.semanticHash > -- > > Key: SPARK-42691 > URL: https://issues.apache.org/jira/browse/SPARK-42691 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > Implement Dataset.semanticHash: > {code:java} > /** > * Returns a `hashCode` of the logical query plan against this [[Dataset]]. > * > * @note Unlike the standard `hashCode`, the hash is calculated against the > query plan > * simplified by tolerating the cosmetic differences such as attribute names. > * @since 3.4.0 > */ > @DeveloperApi > def semanticHash(): Int{code} > This has to be computed on the spark connect server to do this. Please extend > the > AnalyzePlanRequest and AnalyzePlanResponse messages for this. > Also make sure this works in PySpark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42691) Implement Dataset.semanticHash
[ https://issues.apache.org/jira/browse/SPARK-42691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698872#comment-17698872 ] jiaan.geng commented on SPARK-42691: I will take a look! > Implement Dataset.semanticHash > -- > > Key: SPARK-42691 > URL: https://issues.apache.org/jira/browse/SPARK-42691 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > Implement Dataset.semanticHash: > {code:java} > /** > * Returns a `hashCode` of the logical query plan against this [[Dataset]]. > * > * @note Unlike the standard `hashCode`, the hash is calculated against the > query plan > * simplified by tolerating the cosmetic differences such as attribute names. > * @since 3.4.0 > */ > @DeveloperApi > def semanticHash(): Int{code} > This has to be computed on the spark connect server to do this. Please extend > the > AnalyzePlanRequest and AnalyzePlanResponse messages for this. > Also make sure this works in PySpark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org